Gene Expression Profile Algorithm and Test for Determining Prognosis of Prostate Cancer

ABSTRACT

The present invention provides algorithm-based molecular assays that involve measurement of expression levels of genes, or their co-expressed genes, from a biological sample obtained from a prostate cancer patient. The genes may be grouped into functional gene subsets for calculating a quantitative score useful to predict a likelihood of a clinical outcome for a prostate cancer patient.

RELATED APPLICATIONS

This application claims the benefit of priority of U.S. ProvisionalApplication Nos. 61/593,106, filed Jan. 31, 2012; 61/672,679, filed Jul.17, 2012; and 61/713,734, filed Oct. 15, 2012, all of which are herebyincorporated by reference in their entirety.

TECHNICAL FIELD

The present disclosure relates to molecular diagnostic assays thatprovide information concerning gene expression profiles to determineprognostic information for cancer patients. Specifically, the presentdisclosure provides an algorithm comprising genes, or co-expressedgenes, the expression levels of which may be used to determine thelikelihood that a prostate cancer patient will experience a positive ora negative clinical outcome.

INTRODUCTION

The introduction of prostate-specific antigen (PSA) screening in 1987has led to the diagnosis and aggressive treatment of many cases ofindolent prostate cancer that would never have become clinicallysignificant or caused death. The reason for this is that the naturalhistory of prostate cancer is unusual among malignancies in that themajority of cases are indolent and even if untreated would not progressduring the course of a man's life to cause suffering or death. Whileapproximately half of men develop invasive prostate cancer during theirlifetimes (as detected by autopsy studies) (B. Halpert et al, Cancer 16:737-742 (1963); B. Holund, Scand J Urol Nephrol 14: 29-35 (1980); S.Lundberg et al., Scand J Urol Nephrol 4: 93-97 (1970); M. Yin et al., JUrol 179: 892-895 (2008)), only 17% will be diagnosed with prostatecancer and only 3% will die as a result of prostate cancer. Cancer Factsand Figures. Atlanta, Ga.: American Cancer Society (2010); JE Damber etal., Lancet 371: 1710-1721 (2008).

However, currently, over 90% of men who are diagnosed with prostatecancer, even low-risk prostate cancer, are treated with either immediateradical prostatectomy or definitive radiation therapy. MR Cooperberg etal., J Clin Oncol 28: 1117-1123 (2010); MR Cooperberg et al., J ClinOncol 23: 8146-8151 (2005). Surgery and radiation therapy reduce therisk of recurrence and death from prostate cancer (AV D'Amico et al.,Jama 280: 969-974 (1998); M Han et al., Urol Clin North Am 28: 555-565(2001); WU Shipley et al., Jama 281: 1598-1604 (1999); AJ Stephenson etal., J Clin Oncol 27: 4300-4305 (2009)), however estimates of the numberof men that must be treated to prevent one death from prostate cancerrange from 12 to 100. A Bill-Axelson et al., J Natl Cancer Inst 100:1144-1154 (2008); J Hugosson et al., Lancet Oncol 11: 725-732 (2010); LHKlotz et al., Can J Urol 13 Suppl 1: 48-55 (2006); S Loeb et al., J ClinOncol 29: 464-467 (2011); FH Schroder et al., N Engl J Med 360:1320-1328 (2009). This over-treatment of prostate cancer comes at a costof money and toxicity. For example, the majority of men who undergoradical prostatectomy suffer incontinence and impotence as a result ofthe procedure (MS Litwin et al., Cancer 109: 2239-2247 (2007); MG Sandaet al., N Engl J Med 358: 1250-1261 (2008), and as many as 25% of menregret their choice of treatment for prostate cancer. FR Schroeck etal., Eur Urol 54: 785-793 (2008).

One of the reasons for the over-treatment of prostate cancer is the lackof adequate prognostic tools to distinguish men who need immediatedefinitive therapy from those who are appropriate candidates to deferimmediate therapy and undergo active surveillance instead. For example,of men who appear to have low-risk disease based on the results ofclinical staging, pre-treatment PSA, and biopsy Gleason score, and havebeen managed with active surveillance on protocols, 30-40% experiencedisease progression (diagnosed by rising PSA, an increased Gleason scoreon repeat biopsy, or clinical progression) over the first few years offollow-up, and some of them may have lost the opportunity for curativetherapy. HB Carter et al., J Urol 178: 2359-2364 and discussion2364-2355 (2007); MA Dall'Era et al., Cancer 112: 2664-2670 (2008); LKlotz et al., J Clin Oncol 28: 126-131 (2010). Also, of men who appearto be candidates for active surveillance, but who undergo immediateprostatectomy anyway, 30-40% are found at surgery to have higher riskdisease than expected as defined by having high-grade (Gleason score of3+4 or higher) or non-organ-confined disease (extracapsular extension(ECE) or seminal vesicle involvement (SVI)). S L et al., J Urol 181:1628-1633 and discussion 1633-1624 (2009); CR Griffin et al., J Urol178: 860-863 (2007); PW Mufarrij et al., J Urol 181: 607-608 (2009).

Estimates of recurrence risk and treatment decisions in prostate cancerare currently based primarily on PSA levels and/or clinical tumor stage.Although clinical tumor stage has been demonstrated to have asignificant association with outcome, sufficient to be included inpathology reports, the College of American Pathologists ConsensusStatement noted that variations in approach to the acquisition,interpretation, reporting, and analysis of this information exist. C.Compton, et al., Arch Pathol Lab Med 124:979-992 (2000). As aconsequence, existing pathologic staging methods have been criticized aslacking reproducibility and therefore may provide imprecise estimates ofindividual patient risk.

SUMMARY

This application discloses molecular assays that involve measurement ofexpression level(s) of one or more genes or gene subsets from abiological sample obtained from a prostate cancer patient, and analysisof the measured expression levels to provide information concerning thelikelihood of a clinical outcome. For example, the likelihood of aclinical outcome may be described in terms of a quantitative score basedon clinical or biochemical recurrence-free interval, overall survival,prostate cancer-specific survival, upstaging/upgrading from biopsy toradical prostatectomy, or presence of high grade or non-organ confineddisease at radical prostatectomy.

In addition, this application discloses molecular assays that involvemeasurement of expression level(s) of one or more genes or gene subsetsfrom a biological sample obtained to identify a risk classification fora prostate cancer patient. For example, patients may be stratified usingexpression level(s) of one or more genes, positively or negatively, withpositive clinical outcome of prostate cancer, or with a prognosticfactor. In an exemplary embodiment, the prognostic factor is Gleasonscore.

The present invention provides a method of predicting the likelihood ofa clinical outcome for a patient with prostate cancer comprisingdetermination of a level of one or more RNA transcripts, or anexpression product thereof, in a biological sample containing tumorcells obtained from the patient, wherein the RNA transcript, or itsexpression product, is selected from the 81 genes shown in FIG. 1 andlisted in Tables 1A and 1B. The method comprises assigning the one ormore RNA transcripts, or an expression product thereof, to one or moregene groups selected from a cellular organization gene group, basalepithelia gene group, a stress response gene group, an androgen genegroup, a stromal response gene group, and a proliferation gene group.The method further comprises calculating a quantitative score for thepatient by weighting the level of the one or more RNA transcripts or anexpression product thereof, by their contribution to a clinical outcomeand predicting the likelihood of a clinical outcome for the patientbased on the quantitative score. In an embodiment of the invention, anincrease in the quantitative score correlates with an increasedlikelihood of a negative clinical outcome.

In a particular embodiment, the one or more RNA transcripts, or anexpression product thereof, is selected from BIN1, IGF1, C7, GSN, DES,TGFB1I1, TPM2, VCL, FLNC, ITGA7, COL6A1, PPP1R12A, GSTM1, GSTM2, PAGE4,PPAP2B, SRD5A2, PRKCA, IGFBP6, GPM6B, OLFML3, HLF, CYP3A5, KRT15, KRT5,LAMB3, SDC1, DUSP1, EGFR1, FOS, JUN, EGR3, GADD45B, ZFP36, FAM13C, KLK2,ASPN, SFRP4, BGN, THBS2, INHBA, COL1A1, COL3A1, COL1A2, SPARC, COL8A1,COL4A1, FN1, FAP, COL5A2, CDC20, TPX2, UBE2T, MYBL2, and CDKN2C. BIN1,IGF1, C7, GSN, DES, TGFB1I1, TPM2, VCL, FLNC, ITGA7, COL6A1, PPP1R12A,GSTM1, GSTM2, PAGE4, PPAP2B, SRD5A2, PRKCA, IGFBP6, GPM6B, OLFML3, andHLF are assigned to the cellular organization gene group. CYP3A5, KRT15,KRT5, LAMB3, and SDC1 are assigned to the basal epithelial gene group.DUSP1, EGFR1, FOS, JUN, EGR3, GADD45B, and ZFP36 are assigned to thestress response gene group. FAM13C, KLK2, AZGP1, and SRD5A2 are assignedto the androgen gene group. ASPN, SFRP4, BGN, THBS2, INHBA, COL1A1,COL3A1, COL1A2, SPARC, COL8A1, COL4A1, FN1, FAP and COL5A2 are assignedto the stromal response gene group. CDC20, TPX2, UBE2T, MYBL2, andCDKN2C are assigned to the proliferation gene group. The method mayfurther comprise determining the level of at least one RNA transcript,or an expression product thereof, selected from STAT5B, NFAT5, AZGP1,ANPEP, IGFBP2, SLC22A3, ERG, AR, SRD5A2, GSTM1, and GSTM2.

In an embodiment of the invention, the level of one or more RNAtranscripts, or an expression product thereof, from each of the stromalresponse gene group and the cellular organization gene group aredetermined. In another embodiment, the level of one or more RNAtranscripts, or expression products thereof, from each of the stromalresponse gene group and PSA gene group are determined. Additionally, thelevel of one or more RNA transcripts, or expression products thereof,from the cellular organization gene group and/or proliferation genegroup may be determined. In this embodiment, gene(s) to be assayed fromthe stromal response gene group may be selected from ASPN, BGN, COL1A1,SPARC, FN1, COL3A1, COL4A1, INHBA, THBS2, and SFRP4; gene(s) to beassayed from the androgen gene group may be selected from FAM13C andKLK2; gene(s) to be assayed from the cellular organization gene groupmay be selected from FLNC, GSN, GSTM2, IGFBP6, PPAP2B, PPP1R12A, BIN1,VCL, IGF1, TPM2, C7, and GSTM1; and gene(s) to be assayed from theproliferation gene group may be selected from TPX2, CDC20, and MYBL2.

In a particular embodiment, the RNA transcripts, or their expressionproducts, are selected from BGN, COL1A1, SFRP4, FLNC, GSN, TPM2, TPX2,FAM13C, KLK2, AZGP1, GSTM2, and SRD5A2. BGN, COL1A1, and SFRP4 areassigned to the stromal response gene group; FLNC, GSN, and TPM2 areassigned to the cellular organization gene group; and FAM13C and KLK2are assigned to the androgen gene group. The level of the RNAtranscripts, or their expression products, comprising at least one ofthe gene groups selected from the stromal response gene group, cellularorganization gene group, and androgen gene group, may be determined forthe method of the invention. In any of the embodiments, the androgengene group may further comprise AZGP1 and SRD5A2.

In addition, the level of any one of the gene combinations show in Table4 may be determined. For instance, the RS0 model in Table 4 comprisesdetermining the levels of the RNA transcripts, or gene expressionproducts thereof, of ASPN, BGN, COL1A1, SPARC, FLNC, GSN, GSTM2, IGFBP6,PPAP2B, PPP1R12A, TPX2, CDC20, MYBL2, FAM13C, KLK2, STAT5B, and NFAT5.Furthermore, any one of the algorithms shown in Table 4 may be used tocalculate the quantitative score for the patient.

BRIEF DESCRIPTION OF THE DRAWING

FIG. 1 is a dendrogram depicting the association of the 81 genesselected from the gene identification study.

FIGS. 2A-2E are scatter plots showing the comparison of normalized geneexpression (Cp) for matched samples from each patient where the x-axisis the gene expression from the primary Gleason pattern RP sample (PGP)and the y-axis is the gene expression from the biopsy (BX) sample. FIG.2A: All ECM (stomal response) genes; FIG. 2B: All migration (cellularorganization) genes; FIG. 2C: All proliferation genes; FIG. 2D: PSA(androgen) genes; FIG. 2E: other genes from the 81 gene list that do notfall within any of these four gene groups.

FIGS. 3A-3D are range plots of gene expression of individual geneswithin each gene group in the biopsy (BX) and PGP RP samples. FIG. 3A:All ECM (stromal response) genes; FIG. 3B: All migration (cellularorganization) genes; FIG. 3C: All proliferation genes; FIG. 3D: othergenes from the 81 gene list that do not fall within any of the genegroups.

FIG. 4 is a schematic illustration of the clique-stack method used toidentify co-expressed genes.

FIG. 5 shows examples of cliques and stacks. FIG. 5( a) is an example ofa graph that is not a clique; FIG. 5( b) is an example of a clique; FIG.5( c) is an example of a clique but is not a maximal clique.

FIG. 6 is a graph showing two maximal cliques: 1-2-3-4-5 and 1-2-3-4-6.

FIG. 7 schematically illustrates stacking of two maximal cliques.

FIG. 8 is a graph showing that RS27 and CAPRA risk groups predictfreedom from high-grade or non-organ-confined disease.

FIG. 9 is a graph showing that RS27 and AUA risk groups predict freedomfrom high-grade or non-organ-confined disease.

FIG. 10 is a graph showing time to clinical recurrence of PTEN low andPTEN normal patients from the gene identification study.

FIG. 11 is a graph showing time to clinical recurrence of patients fromthe gene identification study stratified into PTEN low/normal andTMPRSS-ERG negative/positive.

DEFINITIONS

Unless defined otherwise, technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. Singleton et al., Dictionary ofMicrobiology and Molecular Biology 2nd ed., J. Wiley & Sons (New York,N.Y. 1994), and March, Advanced Organic Chemistry Reactions, Mechanismsand Structure 4th ed., John Wiley & Sons (New York, N.Y. 1992), provideone skilled in the art with a general guide to many of the terms used inthe present application.

One skilled in the art will recognize many methods and materials similaror equivalent to those described herein, which could be used in thepractice of the present invention. Indeed, the present invention is inno way limited to the methods and materials described herein. Forpurposes of the invention, the following terms are defined below.

The terms “tumor” and “lesion” as used herein, refer to all neoplasticcell growth and proliferation, whether malignant or benign, and allpre-cancerous and cancerous cells and tissues. Those skilled in the artwill realize that a tumor tissue sample may comprise multiple biologicalelements, such as one or more cancer cells, partial or fragmented cells,tumors in various stages, surrounding histologically normal-appearingtissue, and/or macro or micro-dissected tissue.

The terms “cancer” and “cancerous” refer to or describe thephysiological condition in mammals that is typically characterized byunregulated cell growth. Examples of cancer in the present disclosureinclude cancer of the urogenital tract, such as prostate cancer.

As used herein, the term “prostate cancer” is used in the broadest senseand refers to all stages and all forms of cancer arising from the tissueof the prostate gland.

Staging of the cancer assists a physician in assessing how far thedisease has progressed and to plan a treatment for the patient. Stagingmay be done clinically (clinical staging) by physical examination, bloodtests, or response to radiation therapy, and/or pathologically(pathologic staging) based on surgery, such as radical prostatectomy.According to the tumor, node, metastasis (TNM) staging system of theAmerican Joint Committee on Cancer (AJCC), AJCC Cancer Staging Manual(7th Ed., 2010), the various stages of prostate cancer are defined asfollows: Tumor: T1: clinically inapparent tumor not palpable or visibleby imaging, T1a: tumor incidental histological finding in 5% or less oftissue resected, T1b: tumor incidental histological finding in more than5% of tissue resected, T1c: tumor identified by needle biopsy; T2: tumorconfined within prostate, T2a: tumor involves one half of one lobe orless, T2b: tumor involves more than half of one lobe, but not bothlobes, T2c: tumor involves both lobes; T3: tumor extends through theprostatic capsule, T3a: extracapsular extension (unilateral orbilateral), T3b: tumor invades seminal vesicle(s); T4: tumor is fixed orinvades adjacent structures other than seminal vesicles (bladder neck,external sphincter, rectum, levator muscles, or pelvic wall). Generally,a clinical T (cT) stage is T1 or T2 and pathologic T (pT) stage is T2 orhigher. Node: NO: no regional lymph node metastasis; N1: metastasis inregional lymph nodes. Metastasis: M0: no distant metastasis; M1: distantmetastasis present.

The Gleason Grading system is used to help evaluate the prognosis of menwith prostate cancer. Together with other parameters, it is incorporatedinto a strategy of prostate cancer staging, which predicts prognosis andhelps guide therapy. A Gleason “score” or “grade” is given to prostatecancer based upon its microscopic appearance. Tumors with a low Gleasonscore typically grow slowly enough that they may not pose a significantthreat to the patients in their lifetimes. These patients are monitored(“watchful waiting” or “active surveillance”) over time. Cancers with ahigher Gleason score are more aggressive and have a worse prognosis, andthese patients are generally treated with surgery (e.g., radicalprostatectomy) and, in some cases, therapy (e.g., radiation, hormone,ultrasound, chemotherapy). Gleason scores (or sums) comprise grades ofthe two most common tumor patterns. These patterns are referred to asGleason patterns 1-5, with pattern 1 being the most well-differentiated.Most have a mixture of patterns. To obtain a Gleason score or grade, thedominant pattern is added to the second most prevalent pattern to obtaina number between 2 and 10. The Gleason Grades include: G1: welldifferentiated (slight anaplasia) (Gleason 2-4); G2: moderatelydifferentiated (moderate anaplasia) (Gleason 5-6); G3-4: poorlydifferentiated/undifferentiated (marked anaplasia) (Gleason 7-10).

Stage groupings: Stage I: T1a N0 M0 G1; Stage II: (T1a N0 M0 G2-4) or(T1b, c, T1, T2, N0 M0 Any G); Stage III: T3 N0 M0 Any G; Stage IV: (T4N0 M0 Any G) or (Any T N1 M0 Any G) or (Any T Any N M1 Any G).

The term “upgrading” as used herein refers to an increase in Gleasongrade determined from biopsy to Gleason grade determined from radicalprostatectomy (RP). For example, upgrading includes a change in Gleasongrade from 3+3 or 3+4 on biopsy to 3+4 or greater on RP. “Significantupgrading” or “upgrade2” as used herein, refers to a change in Gleasongrade from 3+3 or 3+4 determined from biopsy to 4+3 or greater, orseminal vessical involvement (SVI), or extracapsular involvement (ECE)as determined from RP.

The term “high grade” as used herein refers to Gleason score of >=3+4or >=4+3 on RP. The term “low grade” as used herein refers to a Gleasonscore of 3+3 on RP. In a particular embodiment, “high grade” diseaserefers to Gleason score of at least major pattern 4, minor pattern 5, ortertiary pattern 5.

The term “upstaging” as used herein refers to an increase in tumor stagefrom biopsy to tumor stage at RP. For example, upstaging is a change intumor stage from clinical T1 or T2 stage at biopsy to pathologic T3stage at RP.

The term “non organ-confined disease” as used herein refers to havingpathologic stage T3 disease at RP. The term “organ-confined” as usedherein refers to pathologic stage pT2 at RP.

The term “adverse pathology” as used herein refers to a high gradedisease as defined above, or non organ-confined disease as definedabove. In a particular embodiment, “adverse pathology” refers toprostate cancer with a Gleason score of >=3+4 or >=4+3 or pathologicstage T3.

In another embodiment, the term “high-grade or non-organ-confineddisease” refers to prostate cancer with a Gleason score of at leastmajor pattern 4, minor pattern 5, or tertiary pattern 5, or pathologicstage T3.

As used herein, the terms “active surveillance” and “watchful waiting”mean closely monitoring a patient's condition without giving anytreatment until symptoms appear or change. For example, in prostatecancer, watchful waiting is usually used in older men with other medicalproblems and early-stage disease.

As used herein, the term “surgery” applies to surgical methodsundertaken for removal of cancerous tissue, including pelviclymphadenectomy, radical prostatectomy, transurethral resection of theprostate (TURP), excision, dissection, and tumor biopsy/removal. Thetumor tissue or sections used for gene expression analysis may have beenobtained from any of these methods.

As used herein, the term “biological sample containing cancer cells”refers to a sample comprising tumor material obtained from a cancerpatient. The term encompasses tumor tissue samples, for example, tissueobtained by radical prostatectomy and tissue obtained by biopsy, such asfor example, a core biopsy or a fine needle biopsy. The biologicalsample may be fresh, frozen, or a fixed, wax-embedded tissue sample,such as a formalin-fixed, paraffin-embedded tissue sample. A biologicalsample also encompasses bodily fluids containing cancer cells, such asblood, plasma, serum, urine, and the like. Additionally, the term“biological sample containing cancer cells” encompasses a samplecomprising tumor cells obtained from sites other than the primary tumor,e.g., circulating tumor cells. The term also encompasses cells that arethe progeny of the patient's tumor cells, e.g. cell culture samplesderived from primary tumor cells or circulating tumor cells. The termfurther encompasses samples that may comprise protein or nucleic acidmaterial shed from tumor cells in vivo, e.g., bone marrow, blood,plasma, serum, and the like. The term also encompasses samples that havebeen enriched for tumor cells or otherwise manipulated after theirprocurement and samples comprising polynucleotides and/or polypeptidesthat are obtained from a patient's tumor material.

Prognostic factors are those variables related to the natural history ofcancer that influence the recurrence rates and outcome of patients oncethey have developed cancer. Clinical parameters that have beenassociated with a worse prognosis include, for example, increased tumorstage, high PSA level at presentation, and high Gleason grade orpattern. Prognostic factors are frequently used to categorize patientsinto subgroups with different baseline relapse risks.

The term “prognosis” is used herein to refer to the likelihood that acancer patient will have a cancer-attributable death or progression,including recurrence, metastatic spread, and drug resistance, of aneoplastic disease, such as prostate cancer. For example, a “goodprognosis” would include long term survival without recurrence and a“bad prognosis” would include cancer recurrence.

A “positive clinical outcome” can be assessed using any endpointindicating a benefit to the patient, including, without limitation, (1)inhibition, to some extent, of tumor growth, including slowing down andcomplete growth arrest; (2) reduction in the number of tumor cells; (3)reduction in tumor size; (4) inhibition (i.e., reduction, slowing down,or complete stopping) of tumor cell infiltration into adjacentperipheral organs and/or tissues; (5) inhibition of metastasis; (6)enhancement of anti-tumor immune response, possibly resulting inregression or rejection of the tumor; (7) relief, to some extent, of oneor more symptoms associated with the tumor; (8) increase in the durationof survival following treatment; and/or (9) decreased mortality at agiven point of time following treatment. Positive clinical outcome canalso be considered in the context of an individual's outcome relative toan outcome of a population of patients having a comparable clinicaldiagnosis, and can be assessed using various endpoints such as anincrease in the duration of Recurrence-Free Interval (RFI), an increasein survival time (Overall Survival (OS)) or prostate cancer-specificsurvival time (Prostate Cancer-Specific Survival (PCSS)) in apopulation, no upstaging or upgrading in tumor stage or Gleason gradebetween biopsy and radical prostatectomy, presence of 3+3 grade andorgan-confined disease at radical prostatectomy, and the like.

The term “risk classification” means a grouping of subjects by the levelof risk (or likelihood) that the subject will experience a particularnegative clinical outcome. A subject may be classified into a risk groupor classified at a level of risk based on the methods of the presentdisclosure, e.g. high, medium, or low risk. A “risk group” is a group ofsubjects or individuals with a similar level of risk for a particularclinical outcome.

The term “long-term” survival is used herein to refer to survival for aparticular time period, e.g., for at least 5 years, or for at least 10years.

The term “recurrence” is used herein to refer to local or distantrecurrence (i.e., metastasis) of cancer. For example, prostate cancercan recur locally in the tissue next to the prostate or in the seminalvesicles. The cancer may also affect the surrounding lymph nodes in thepelvis or lymph nodes outside this area. Prostate cancer can also spreadto tissues next to the prostate, such as pelvic muscles, bones, or otherorgans. Recurrence can be determined by clinical recurrence detected by,for example, imaging study or biopsy, or biochemical recurrence detectedby, for example, sustained follow-up prostate-specific antigen (PSA)levels≧0.4 ng/mL or the initiation of salvage therapy as a result of arising PSA level.

The term “clinical recurrence-free interval (cRFI)” is used herein astime from surgery to first clinical recurrence or death due to clinicalrecurrence of prostate cancer. If follow-up ended without occurrence ofclinical recurrence, or other primary cancers or death occurred prior toclinical recurrence, time to cRFI is considered censored; when thisoccurs, the only information known is that up through the censoringtime, clinical recurrence has not occurred in this subject. Biochemicalrecurrences are ignored for the purposes of calculating cRFI.

The term “biochemical recurrence-free interval (bRFI)” is used herein tomean the time from surgery to first biochemical recurrence of prostatecancer. If clinical recurrence occurred before biochemical recurrence,follow-up ended without occurrence of bRFI, or other primary cancers ordeath occurred prior to biochemical recurrence, time to biochemicalrecurrence is considered censored at the first of these.

The term “Overall Survival (OS)” is used herein to refer to the timefrom surgery to death from any cause. If the subject was still alive atthe time of last follow-up, survival time is considered censored at thetime of last follow-up. Biochemical recurrence and clinical recurrenceare ignored for the purposes of calculating OS.

The term “Prostate Cancer-Specific Survival (PCSS)” is used herein todescribe the time from surgery to death from prostate cancer. If thepatient did not die of prostate cancer before end of followup, or dieddue to other causes, PCSS is considered censored at this time. Clinicalrecurrence and biochemical recurrence are ignored for the purposes ofcalculating PCSS.

In practice, the calculation of the time-to-event measures listed abovemay vary from study to study depending on the definition of events to beconsidered censored.

As used herein, the term “expression level” as applied to a gene refersto the normalized level of a gene product, e.g. the normalized valuedetermined for the RNA level of a gene or for the polypeptide level of agene.

The term “gene product” or “expression product” are used herein to referto the RNA (ribonucleic acid) transcription products (transcripts) ofthe gene, including mRNA, and the polypeptide translation products ofsuch RNA transcripts. A gene product can be, for example, an unsplicedRNA, an mRNA, a splice variant mRNA, a microRNA, a fragmented RNA, apolypeptide, a post-translationally modified polypeptide, a splicevariant polypeptide, etc.

The term “RNA transcript” as used herein refers to the RNA transcriptionproducts of a gene, including, for example, mRNA, an unspliced RNA, asplice variant mRNA, a microRNA, and a fragmented RNA.

Unless indicated otherwise, each gene name used herein corresponds tothe Official Symbol assigned to the gene and provided by Entrez Gene(URL: www.ncbi.nlm.nih.gov/sites/entrez) as of the filing date of thisapplication.

The term “microarray” refers to an ordered arrangement of hybridizablearray elements, e.g. oligonucleotide or polynucleotide probes, on asubstrate.

The term “polynucleotide” generally refers to any polyribonucleotide orpolydeoxribonucleotide, which may be unmodified RNA or DNA or modifiedRNA or DNA. Thus, for instance, polynucleotides as defined hereininclude, without limitation, single- and double-stranded DNA, DNAincluding single- and double-stranded regions, single- anddouble-stranded RNA, and RNA including single- and double-strandedregions, hybrid molecules comprising DNA and RNA that may besingle-stranded or, more typically, double-stranded or include single-and double-stranded regions. In addition, the term “polynucleotide” asused herein refers to triple-stranded regions comprising RNA or DNA orboth RNA and DNA. The strands in such regions may be from the samemolecule or from different molecules. The regions may include all of oneor more of the molecules, but more typically involve only a region ofsome of the molecules. One of the molecules of a triple-helical regionoften is an oligonucleotide. The term “polynucleotide” specificallyincludes cDNAs. The term includes DNAs (including cDNAs) and RNAs thatcontain one or more modified bases. Thus, DNAs or RNAs with backbonesmodified for stability or for other reasons, are “polynucleotides” asthat term is intended herein. Moreover, DNAs or RNAs comprising unusualbases, such as inosine, or modified bases, such as tritiated bases, areincluded within the term “polynucleotides” as defined herein. Ingeneral, the term “polynucleotide” embraces all chemically,enzymatically and/or metabolically modified forms of unmodifiedpolynucleotides, as well as the chemical forms of DNA and RNAcharacteristic of viruses and cells, including simple and complex cells.

The term “oligonucleotide” refers to a relatively short polynucleotide,including, without limitation, single-stranded deoxyribonucleotides,single- or double-stranded ribonucleotides, RNArDNA hybrids anddouble-stranded DNAs. Oligonucleotides, such as single-stranded DNAprobe oligonucleotides, are often synthesized by chemical methods, forexample using automated oligonucleotide synthesizers that arecommercially available. However, oligonucleotides can be made by avariety of other methods, including in vitro recombinant DNA-mediatedtechniques and by expression of DNAs in cells and organisms.

The term “Ct” as used herein refers to threshold cycle, the cycle numberin quantitative polymerase chain reaction (qPCR) at which thefluorescence generated within a reaction well exceeds the definedthreshold, i.e. the point during the reaction at which a sufficientnumber of amplicons have accumulated to meet the defined threshold.

The term “Cp” as used herein refers to “crossing point.” The Cp value iscalculated by determining the second derivatives of entire qPCRamplification curves and their maximum value. The Cp value representsthe cycle at which the increase of fluorescence is highest and where thelogarithmic phase of a PCR begins.

The terms “threshold” or “thresholding” refer to a procedure used toaccount for non-linear relationships between gene expressionmeasurements and clinical response as well as to further reducevariation in reported patient scores. When thresholding is applied, allmeasurements below or above a threshold are set to that threshold value.A non-linear relationship between gene expression and outcome could beexamined using smoothers or cubic splines to model gene expression onrecurrence free interval using Cox PH regression or on adverse pathologystatus using logistic regression. D. Cox, Journal of the RoyalStatistical Society, Series B 34:187-220 (1972). Variation in reportedpatient scores could be examined as a function of variability in geneexpression at the limit of quantitation and/or detection for aparticular gene.

As used herein, the term “amplicon,” refers to pieces of DNA that havebeen synthesized using amplification techniques, such as polymerasechain reactions (PCR) and ligase chain reactions.

“Stringency” of hybridization reactions is readily determinable by oneof ordinary skill in the art, and generally is an empirical calculationdependent upon probe length, washing temperature, and saltconcentration. In general, longer probes require higher temperatures forproper annealing, while shorter probes need lower temperatures.Hybridization generally depends on the ability of denatured DNA tore-anneal when complementary strands are present in an environment belowtheir melting temperature. The higher the degree of desired homologybetween the probe and hybridizable sequence, the higher the relativetemperature which can be used. As a result, it follows that higherrelative temperatures would tend to make the reaction conditions morestringent, while lower temperatures less so. For additional details andexplanation of stringency of hybridization reactions, see Ausubel etal., Current Protocols in Molecular Biology (Wiley IntersciencePublishers, 1995).

“Stringent conditions” or “high stringency conditions”, as definedherein, typically: (1) employ low ionic strength and high temperaturefor washing, for example 0.015 M sodium chloride/0.0015 M sodiumcitrate/0.1% sodium dodecyl sulfate at 50° C.; (2) employ duringhybridization a denaturing agent, such as formamide, for example, 50%(v/v) formamide with 0.1% bovine serum albumin/0.1% Ficoll/0.1%polyvinylpyrrolidone/50 mM sodium phosphate buffer at pH 6.5 with 750 mMsodium chloride, 75 mM sodium citrate at 42° C.; or (3) employ 50%formamide, 5×SSC (0.75 M NaCl, 0.075 M sodium citrate), 50 mM sodiumphosphate (pH 6.8), 0.1% sodium pyrophosphate, 5×Denhardt's solution,sonicated salmon sperm DNA (50 μg/ml), 0.1% SDS, and 10% dextran sulfateat 42° C., with washes at 42° C. in 0.2×SSC (sodium chloride/sodiumcitrate) and 50% formamide, followed by a high-stringency washconsisting of 0.1×SSC containing EDTA at 55° C.

“Moderately stringent conditions” may be identified as described bySambrook et al., Molecular Cloning: A Laboratory Manual, New York: ColdSpring Harbor Press, 1989, and include the use of washing solution andhybridization conditions (e.g., temperature, ionic strength and % SDS)less stringent that those described above. An example of moderatelystringent conditions is overnight incubation at 37° C. in a solutioncomprising: 20% formamide, 5×SSC (150 mM NaCl, 15 mM trisodium citrate),50 mM sodium phosphate (pH 7.6), 5×Denhardt's solution, 10% dextransulfate, and 20 mg/ml denatured sheared salmon sperm DNA, followed bywashing the filters in 1×SSC at about 37-500 C. The skilled artisan willrecognize how to adjust the temperature, ionic strength, etc. asnecessary to accommodate factors such as probe length and the like.

The terms “splicing” and “RNA splicing” are used interchangeably andrefer to RNA processing that removes introns and joins exons to producemature mRNA with continuous coding sequence that moves into thecytoplasm of an eukaryotic cell.

As used herein, the term “TMPRSS fusion” and “TMPRSS2 fusion” are usedinterchangeably and refer to a fusion of the androgen-driven TMPRSS2gene with the ERG oncogene, which has been demonstrated to have asignificant association with prostate cancer. S. Perner, et al., UrologeA. 46(7):754-760 (2007); S. A. Narod, et al., Br J Cancer 99(6):847-851(2008). As used herein, positive TMPRSS fusion status indicates that theTMPRSS fusion is present in a tissue sample, whereas negative TMPRSSfusion status indicates that the TMPRSS fusion is not present in atissue sample. Experts skilled in the art will recognize that there arenumerous ways to determine TMPRSS fusion status, such as real-time,quantitative PCR or high-throughput sequencing. See, e.g., K. Mertz, etal., Neoplasis 9(3):200-206 (2007); C. Maher, Nature 458(7234):97-101(2009).

The terms “correlated” and “associated” are used interchangeably hereinto refer to the association between two measurements (or measuredentities). The disclosure provides genes or gene subsets, the expressionlevels of which are associated with clinical outcome. For example, theincreased expression level of a gene may be positively correlated(positively associated) with a good or positive clinical outcome. Such apositive correlation may be demonstrated statistically in various ways,e.g. by a cancer recurrence hazard ratio less than one or by a cancerupgrading or upstaging odds ratio of less than one. In another example,the increased expression level of a gene may be negatively correlated(negatively associated) with a good or positive clinical outcome. Inthat case, for example, the patient may experience a cancer recurrenceor upgrading/upstaging of the cancer, and this may be demonstratedstatistically in various ways, e.g., a hazard ratio greater than for anodds ratio greater than one. “Correlation” is also used herein to referto the strength of association between the expression levels of twodifferent genes, such that the expression level of a first gene can besubstituted with an expression level of a second gene in a givenalgorithm if their expression levels are highly correlated. Such“correlated expression” of two genes that are substitutable in analgorithm are usually gene expression levels that are positivelycorrelated with one another, e.g., if increased expression of a firstgene is positively correlated with an outcome (e.g., increasedlikelihood of good clinical outcome), then the second gene that isco-expressed and exhibits correlated expression with the first gene isalso positively correlated with the same outcome.

The terms “co-express” and “co-expressed”, as used herein, refer to astatistical correlation between the amounts of different transcriptsequences across a population of different patients. Pairwiseco-expression may be calculated by various methods known in the art,e.g., by calculating Pearson correlation coefficients or Spearmancorrelation coefficients. Co-expressed gene cliques may also beidentified by seeding and stacking the maximal clique enumeration (MCE)described in Example 4 herein. An analysis of co-expression may becalculated using normalized expression data. Genes within the same genesubset are also considered to be co-expressed.

A “computer-based system” refers to a system of hardware, software, anddata storage medium used to analyze information. The minimum hardware ofa patient computer-based system comprises a central processing unit(CPU), and hardware for data input, data output (e.g., display), anddata storage. An ordinarily skilled artisan can readily appreciate thatany currently available computer-based systems and/or components thereofare suitable for use in connection with the methods of the presentdisclosure. The data storage medium may comprise any manufacturecomprising a recording of the present information as described above, ora memory access device that can access such a manufacture.

To “record” data, programming or other information on a computerreadable medium refers to a process for storing information, using anysuch methods as known in the art. Any convenient data storage structuremay be chosen, based on the means used to access the stored information.A variety of data processor programs and formats can be used forstorage, e.g. word processing text file, database format, etc.

A “processor” or “computing means” references any hardware and/orsoftware combination that will perform the functions required of it. Forexample, a suitable processor may be a programmable digitalmicroprocessor such as available in the form of an electroniccontroller, mainframe, server or personal computer (desktop orportable). Where the processor is programmable, suitable programming canbe communicated from a remote location to the processor, or previouslysaved in a computer program product (such as a portable or fixedcomputer readable storage medium, whether magnetic, optical or solidstate device based). For example, a magnetic medium or optical disk maycarry the programming, and can be read by a suitable readercommunicating with each processor at its corresponding station.

Algorithm-Based Methods and Gene Subsets

The present invention provides an algorithm-based molecular diagnosticassay for predicting a clinical outcome for a patient with prostatecancer. The expression level of one or more genes may be used alone orarranged into functional gene subsets to calculate a quantitative scorethat can be used to predict the likelihood of a clinical outcome. Thealgorithm-based assay and associated information provided by thepractice of the methods of the present invention facilitate optimaltreatment decision-making in prostate cancer. For example, such aclinical tool would enable physicians to identify patients who have alow likelihood of having an aggressive cancer and therefore would notneed RP, or who have a high likelihood of having an aggressive cancerand therefore would need RP.

As used herein, a “quantitative score” is an arithmetically ormathematically calculated numerical value for aiding in simplifying ordisclosing or informing the analysis of more complex quantitativeinformation, such as the correlation of certain expression levels of thedisclosed genes or gene subsets to a likelihood of a clinical outcome ofa prostate cancer patient. A quantitative score may be determined by theapplication of a specific algorithm. The algorithm used to calculate thequantitative score in the methods disclosed herein may group theexpression level values of genes. The grouping of genes may be performedat least in part based on knowledge of the relative contribution of thegenes according to physiologic functions or component cellularcharacteristics, such as in the groups discussed herein. A quantitativescore may be determined for a gene group (“gene group score”). Theformation of groups, in addition, can facilitate the mathematicalweighting of the contribution of various expression levels of genes orgene subsets to the quantitative score. The weighting of a gene or genegroup representing a physiological process or component cellularcharacteristic can reflect the contribution of that process orcharacteristic to the pathology of the cancer and clinical outcome, suchas recurrence or upgrading/upstaging of the cancer. The presentinvention provides a number of algorithms for calculating thequantitative scores, for example, as set forth in Table 4. In anembodiment of the invention, an increase in the quantitative scoreindicates an increased likelihood of a negative clinical outcome.

In an embodiment, a quantitative score is a “recurrence score,” whichindicates the likelihood of a cancer recurrence, upgrading or upstagingof a cancer, adverse pathology, non-organ-confined disease, high-gradedisease, and/or highgrade or non-organ-confined disease. An increase inthe recurrence score may correlate with an increase in the likelihood ofcancer recurrence, upgrading or upstaging of a cancer, adversepathology, non-organ-confined disease, high-grade disease, and/orhighgrade or non-organ-confined disease.

The gene subsets of the present invention include an ECM gene group,migration gene group, androgen gene group, proliferation gene group,epithelia gene group, and stress gene group.

The gene subsets referred to herein as the “ECM gene group,” “stromalgene group,” and “stromal response gene group” are used interchangeablyand include genes that are synthesized predominantly by stromal cellsand are involved in stromal response and genes that co-express with thegenes of the ECM gene group. “Stromal cells” are referred to herein asconnective tissue cells that make up the support structure of biologicaltissues. Stromal cells include fibroblasts, immune cells, pericytes,endothelial cells, and inflammatory cells. “Stromal response” refers toa desmoplastic response of the host tissues at the site of a primarytumor or invasion. See, e.g., E. Rubin, J. Farber, Pathlogy, 985-986(end Ed. 1994). The ECM gene group includes, for example, ASPN, SFRP4,BGN, THBS2, INHBA, COL1A1, COL3A1, COL1A2, SPARC, COL8A1, COL4A1, FN1,FAP, and COL5A2, and co-expressed genes thereof. Exemplary co-expressedgenes include the genes and/or gene cliques shown in Table 8.

The gene subsets referred to herein as the “migration gene group” or“migration regulation gene group” or “cytoskeletal gene group” or“cellular organization gene group” are used interchangeably and includegenes and co-expressed genes that are part of a dynamic microfilamentnetwork of actin and accessory proteins and that provide intracellularsupport to cells, generate the physical forces for cell movement andcell division, as well as facilitate intracellular transport of vesiclesand cellular organelle. The migration gene group includes, for example,BIN1, IGF1, C7, GSN, DES, TGFB1I1, TPM2, VCL, FLNC, ITGA7, COL6A1,PPP1R12A, GSTM1, GSTM2, PAGE4, PPAP2B, SRD5A2, PRKCA, IGFBP6, GPM6B,OLFML3, and HLF, and co-expressed genes thereof. Exemplary co-expressedgenes and/or gene cliques are provided in Table 9.

The gene subset referred to herein as the “androgen gene group,” “PSAgene group,” and “PSA regulation gene group” are used interchangeablyand include genes that are members of the kallikrein family of serineproteases (e.g. kallikrein 3 [PSA]), and genes that co-express withgenes of the androgen gene group. The androgen gene group includes, forexample, FAM13C and KLK2, and co-expressed genes thereof. The androgengene group may further comprise AZGP1 and SRD5A2, and co-expressed genesthereof.

The gene subsets referred to herein as the “proliferation gene group”and “cell cycle gene group” are used interchangeably and include genesthat are involved with cell cycle functions and genes that co-expresswith genes of the proliferation gene group. “Cell cycle functions” asused herein refers to cell proliferation and cell cycle control, e.g.,checkpoint/G1 to S phase transition. The proliferation gene group thusincludes, for example, CDC20, TPX2, UBE2T, MYBL2, and CDKN2C, andco-expressed genes thereof. Exemplary co-expressed genes and/or genecliques are provided in Table 10.

The gene subsets referred to herein as the “epithelia gene group” and“basal epithelia gene group” are used interchangeably and include genesthat are expressed during the differentiation of a polarized epitheliumand that provide intracellular structural integrity to facilitatephysical interactions with neighboring epithelial cells, and genes thatco-express with genes of the epithelia gene group. The epithelia genegroup includes, for example, CYP3A5, KRT15, KRT5, LAMB3, and SDC1 andco-expressed genes thereof.

The gene subset referred to herein as the “stress gene group,” “stressresponse gene group,” and “early response gene group” are usedinterchangeably and includes genes and co-expressed genes that aretranscription factors and DNA-binding proteins activated rapidly andtransiently in response to cellular stress and other extracellularsignals. These factors, in turn, regulate the transcription of a diverserange of genes. The stress gene group includes, for example, DUSP1,EGR1, FOS, JUN, EGR3, GADD45B, and ZFP36, and co-expressed genesthereof. Exemplary co-expressed genes and/or gene cliques are providedin Table 11.

Expression levels of other genes and their co-expressed genes may beused with one more of the above gene subsets to predict a likelihood ofa clinical outcome of a prostate cancer patient. For example, theexpression level of one or more genes selected from the 81 genes of FIG.1 or Table 1A or 1B that do not fall within any of the disclosed genesubsets may be used with one or more of the disclosed gene subsets. Inan embodiment of the invention, one or more of STAT5B, NFAT5, AZGP1,ANPEP, IGFBP2, SLC22A3, ERG, AR, SRD5A2, GSTM1, and GSTM2 may be used inone or more gene subsets described above to predict a likelihood of aclinical outcome.

The present invention also provides methods to determine a thresholdexpression level for a particular gene. A threshold expression level maybe calculated for a specific gene. A threshold expression level for agene may be based on a normalized expression level. In one example, aC_(p)threshold expression level may be calculated by assessingfunctional forms using logistic regression or Cox proportional hazardsregression.

The present invention further provides methods to determine genes thatco-express with particular genes identified by, e.g., quantitativeRT-PCR (qRT-PCR), as validated biomarkers relevant to a particular typeof cancer. The co-expressed genes are themselves useful biomarkers. Theco-expressed genes may be substituted for the genes with which theyco-express. The methods can include identifying gene cliques frommicroarray data, normalizing the microarray data, computing a pairwiseSpearman correlation matrix for the array probes, filtering outsignificant co-expressed probes across different studies, building agraph, mapping the probe to genes, and generating a gene clique report.An exemplary method for identifying co-expressed genes is described inExample 3 below, and co-expressed genes identified using this method areprovided in Tables 8-11. The expression levels of one or more genes of agene clique may be used to calculate the likelihood that a patient withprostate cancer will experience a positive clinical outcome, such as areduced likelihood of a cancer recurrence.

Any one or more combinations of gene groups may be assayed in the methodof the present invention. For example, a stromal response gene group maybe assayed, alone or in combination, with a cellular organization genegroup, a proliferation gene group, and/or an androgen gene group. Inaddition, any number of genes within each gene group may be assayed.

In a specific embodiment of the invention, a method for predicting aclinical outcome for a patient with prostate cancer comprises measuringan expression level of at least one gene from a stromal response genegroup, or a co-expressed gene thereof, and at least one gene from acellular organization gene group, or a co-expressed gene thereof. Inanother embodiment, the expression level of at least two genes from astromal response gene group, or a co-expressed gene thereof, and atleast two genes from a cellular organization gene group, or aco-expressed gene thereof, are measured. In yet another embodiment, theexpression levels of at least three genes are measured from each of thestromal response gene group and the cellular organization gene group. Ina further embodiment, the expression levels of at least four genes aremeasured from each of the stromal response gene group and the cellularorganization gene group. In another embodiment, the expression levels ofat least five genes are measured from each of the stromal response genegroup and the cellular organization gene group. In yet a furtherembodiment, the expression levels of at least six genes are measuredfrom each of the stromal response gene group and the cellularorganization gene group.

In another specific embodiment, the expression level of at least onegene from the stromal response gene group, or a co-expressed genethereof, may be measured in addition to the expression level of at leastone gene from an androgen gene group, or a co-expressed gene thereof. Ina particular embodiment, the expression levels of at least three genes,or co-expressed genes thereof, from the stromal response gene group, andthe expression level of at least one gene, or co-expressed gene thereof,from the androgen gene group may be measured.

In a further embodiment, the expression level of at least one gene eachfrom the stromal response gene group, the androgen gene group, and thecellular organization gene group, or co-expressed genes thereof, may bemeasured. In a particular embodiment, the level of at least three genesfrom the stromal response gene group, at least one gene from theandrogen gene group, and at least three genes from the cellularorganization gene group may be measured. In another embodiment, theexpression level of at least one gene each from the stromal responsegene group, the androgen gene group, and the proliferation gene group,or co-expressed genes thereof, may be measured. In a particularembodiment, the level of at least three genes from the stromal responsegene group, at least one gene from the androgen gene group, and at leastone gene from the proliferation gene group may be measured. In either ofthese combinations, at least two genes from the androgen gene group mayalso be measured. In any of the combinations, at least four genes fromthe androgen gene group may also be measured.

In another embodiment, the expression level of at least one gene eachfrom the stromal response gene group, the androgen gene group, thecellular organization gene group, and the proliferation gene group, orco-expressed genes thereof, may be measured. In a particular embodiment,the level of at least three genes from the stromal response gene group,at least three genes from the cellular organization gene group, at leastone gene from the proliferation gene group, and at least two genes fromthe androgen gene group may be measured. In any of the embodiments, atleast four genes from the androgen gene group may be measured.

Additionally, expression levels of one or more genes that do not fallwithin the gene subsets described herein may be measured with any of thecombinations of the gene subsets described herein. Alternatively, anygene that falls within a gene subset may be analyzed separately from thegene subset, or in another gene subset. For example, the expressionlevels of at least one, at least two, at least three, or at least 4genes may be measured in addition to the gene subsets described herein.In an embodiment of the invention, the additional gene(s) are selectedfrom STAT5B, NFAT5, AZGP1, ANPEP, IGFBP2, SLC22A3, ERG, AR, SRD5A2,GSTM1, and GSTM2.

In a specific embodiment, the method of the invention comprisesmeasuring the expression levels of the specific combinations of genesand gene subsets shown in Table 4. In a further embodiment, gene groupscore(s) and quantitative score(s) are calculated according to thealgorithm(s) shown in Table 4.

Various technological approaches for determination of expression levelsof the disclosed genes are set forth in this specification, including,without limitation, RT-PCR, microarrays, high-throughput sequencing,serial analysis of gene expression (SAGE) and Digital Gene Expression(DGE), which will be discussed in detail below. In particular aspects,the expression level of each gene may be determined in relation tovarious features of the expression products of the gene including exons,introns, protein epitopes and protein activity.

The expression product that is assayed can be, for example, RNA or apolypeptide. The expression product may be fragmented. For example, theassay may use primers that are complementary to target sequences of anexpression product and could thus measure full transcripts as well asthose fragmented expression products containing the target sequence.Further information is provided in Table A.

The RNA expression product may be assayed directly or by detection of acDNA product resulting from a PCR-based amplification method, e.g.,quantitative reverse transcription polymerase chain reaction (qRT-PCR).(See e.g., U.S. Pat. No. 7,587,279). Polypeptide expression product maybe assayed using immunohistochemistry (IHC) by proteomics techniques.Further, both RNA and polypeptide expression products may also beassayed using microarrays.

Methods of Assaying Expression Levels of a Gene Product

Methods of gene expression profiling include methods based onhybridization analysis of polynucleotides, methods based on sequencingof polynucleotides, and proteomics-based methods. Exemplary methodsknown in the art for the quantification of RNA expression in a sampleinclude northern blotting and in situ hybridization (Parker & Barnes,Methods in Molecular Biology 106:247-283 (1999)); RNAse protectionassays (Hod, Biotechniques 13:852-854 (1992)); and PCR-based methods,such as reverse transcription PCR (RT-PCR) (Weis et al., Trends inGenetics 8:263-264 (1992)). Antibodies may be employed that canrecognize sequence-specific duplexes, including DNA duplexes, RNAduplexes, and DNA-RNA hybrid duplexes or DNA-protein duplexes.Representative methods for sequencing-based gene expression analysisinclude Serial Analysis of Gene Expression (SAGE), and gene expressionanalysis by massively parallel signature sequencing (MPSS). Othermethods known in the art may be used.

Reverse Transcription PCR (RT-PCR)

Typically, mRNA is isolated from a test sample. The starting material istypically total RNA isolated from a human tumor, usually from a primarytumor. Optionally, normal tissues from the same patient can be used asan internal control. Such normal tissue can be histologically-appearingnormal tissue adjacent to a tumor. mRNA can be extracted from a tissuesample, e.g., from a sample that is fresh, frozen (e.g. fresh frozen),or paraffin-embedded and fixed (e.g. formalin-fixed).

General methods for mRNA extraction are well known in the art and aredisclosed in standard textbooks of molecular biology, including Ausubelet al., Current Protocols of Molecular Biology, John Wiley and Sons(1997). Methods for RNA extraction from paraffin embedded tissues aredisclosed, for example, in Rupp and Locker, Lab Invest. 56:A67 (1987),and De Andres et al., BioTechniques 18:42044 (1995). In particular, RNAisolation can be performed using a purification kit, buffer set andprotease from commercial manufacturers, such as Qiagen, according to themanufacturer's instructions. For example, total RNA from cells inculture can be isolated using Qiagen RNeasy mini-columns. Othercommercially available RNA isolation kits include MasterPure™ CompleteDNA and RNA Purification Kit (EPICENTRE®, Madison, Wis.), and ParaffinBlock RNA Isolation Kit (Ambion, Inc.). Total RNA from tissue samplescan be isolated using RNA Stat-60 (Tel-Test). RNA prepared from tumorcan be isolated, for example, by cesium chloride density gradientcentrifugation.

The sample containing the RNA is then subjected to reverse transcriptionto produce cDNA from the RNA template, followed by exponentialamplification in a PCR reaction. The two most commonly used reversetranscriptases are avilo myeloblastosis virus reverse transcriptase(AMV-RT) and Moloney murine leukemia virus reverse transcriptase(MMLV-RT). The reverse transcription step is typically primed usingspecific primers, random hexamers, or oligo-dT primers, depending on thecircumstances and the goal of expression profiling. For example,extracted RNA can be reverse-transcribed using a GeneAmp RNA PCR kit(Perkin Elmer, Calif., USA), following the manufacturer's instructions.The derived cDNA can then be used as a template in the subsequent PCRreaction.

PCR-based methods use a thermostable DNA-dependent DNA polymerase, suchas a Taq DNA polymerase. For example, TaqMan® PCR typically utilizes the5′-nuclease activity of Taq or Tth polymerase to hydrolyze ahybridization probe bound to its target amplicon, but any enzyme withequivalent 5′ nuclease activity can be used. Two oligonucleotide primersare used to generate an amplicon typical of a PCR reaction product. Athird oligonucleotide, or probe, can be designed to facilitate detectionof a nucleotide sequence of the amplicon located between thehybridization sites the two PCR primers. The probe can be detectablylabeled, e.g., with a reporter dye, and can further be provided withboth a fluorescent dye, and a quencher fluorescent dye, as in a Taqman®probe configuration. Where a Taqman® probe is used, during theamplification reaction, the Taq DNA polymerase enzyme cleaves the probein a template-dependent manner. The resultant probe fragmentsdisassociate in solution, and signal from the released reporter dye isfree from the quenching effect of the second fluorophore. One moleculeof reporter dye is liberated for each new molecule synthesized, anddetection of the unquenched reporter dye provides the basis forquantitative interpretation of the data.

TaqMan® RT-PCR can be performed using commercially available equipment,such as, for example, high-throughput platforms such as the ABI PRISM7700 Sequence Detection System® (Perkin-Elmer-Applied Biosystems, FosterCity, Calif., USA), or Lightcycler (Roche Molecular Biochemicals,Mannheim, Germany). In a preferred embodiment, the procedure is run on aLightCycler® 480 (Roche Diagnostics) real-time PCR system, which is amicrowell plate-based cycler platform.

5′-Nuclease assay data are commonly initially expressed as a thresholdcycle (“C_(t)”). Fluorescence values are recorded during every cycle andrepresent the amount of product amplified to that point in theamplification reaction. The threshold cycle (C_(t)) is generallydescribed as the point when the fluorescent signal is first recorded asstatistically significant. Alternatively, data may be expressed as acrossing point (“Cp”). The Cp value is calculated by determining thesecond derivatives of entire qPCR amplification curves and their maximumvalue. The Cp value represents the cycle at which the increase offluorescence is highest and where the logarithmic phase of a PCR begins.

To minimize errors and the effect of sample-to-sample variation, RT-PCRis usually performed using an internal standard. The ideal internalstandard gene (also referred to as a reference gene) is expressed at aquite constant level among cancerous and non-cancerous tissue of thesame origin (i.e., a level that is not significantly different amongnormal and cancerous tissues), and is not significantly affected by theexperimental treatment (i.e., does not exhibit a significant differencein expression level in the relevant tissue as a result of exposure tochemotherapy), and expressed at a quite constant level among the sametissue taken from different patients. For example, reference genesuseful in the methods disclosed herein should not exhibit significantlydifferent expression levels in cancerous prostate as compared to normalprostate tissue. Exemplary reference genes used for normalizationcomprise one or more of the following genes: AAMP, ARF1, ATP5E, CLTC,GPS1, and PGK1. Gene expression measurements can be normalized relativeto the mean of one or more (e.g., 2, 3, 4, 5, or more) reference genes.Reference-normalized expression measurements can range from 2 to 15,where a one unit increase generally reflects a 2-fold increase in RNAquantity.

Real time PCR is compatible both with quantitative competitive PCR,where an internal competitor for each target sequence is used fornormalization, and with quantitative comparative PCR using anormalization gene contained within the sample, or a housekeeping genefor RT-PCR. For further details see, e.g. Held et al., Genome Research6:986-994 (1996).

The steps of a representative protocol for use in the methods of thepresent disclosure use fixed, paraffin-embedded tissues as the RNAsource. For example, mRNA isolation, purification, primer extension andamplification can be performed according to methods available in theart. (see, e.g., Godfrey et al. J. Molec. Diagnostics 2: 84-91 (2000);Specht et al., Am. J. Pathol. 158: 419-29 (2001)). Briefly, arepresentative process starts with cutting about 10 μm thick sections ofparaffin-embedded tumor tissue samples. The RNA is then extracted, andprotein and DNA depleted from the RNA-containing sample. After analysisof the RNA concentration, RNA is reverse transcribed using gene-specificprimers followed by RT-PCR to provide for cDNA amplification products.

Design of Intron-Based PCR Primers and Probes

PCR primers and probes can be designed based upon exon or intronsequences present in the mRNA transcript of the gene of interest.Primer/probe design can be performed using publicly available software,such as the DNA BLAT software developed by Kent, W. J., Genome Res.12(4):656-64 (2002), or by the BLAST software including its variations.

Where necessary or desired, repetitive sequences of the target sequencecan be masked to mitigate non-specific signals. Exemplary tools toaccomplish this include the Repeat Masker program available on-linethrough the Baylor College of Medicine, which screens DNA sequencesagainst a library of repetitive elements and returns a query sequence inwhich the repetitive elements are masked. The masked intron sequencescan then be used to design primer and probe sequences using anycommercially or otherwise publicly available primer/probe designpackages, such as Primer Express (Applied Biosystems); MGBassay-by-design (Applied Biosystems); Primer3 (Steve Rozen and Helen J.Skaletsky (2000) Primer3 on the WWW for general users and for biologistprogrammers. See S. Rrawetz, S. Misener, Bioinformatics Methods andProtocols: Methods in Molecular Biology, pp. 365-386 (Humana Press).

Other factors that can influence PCR primer design include primerlength, melting temperature (Tm), and G/C content, specificity,complementary primer sequences, and 3′-end sequence. In general, optimalPCR primers are generally 17-30 bases in length, and contain about20-80%, such as, for example, about 50-60% G+C bases, and exhibit Tm'sbetween 50 and 80° C., e.g. about 50 to 70° C.

For further guidelines for PCR primer and probe design see, e.g.Dieffenbach, C W. et al, “General Concepts for PCR Primer Design” in:PCR Primer, A Laboratory Manual, Cold Spring Harbor Laboratory Press.,New York, 1995, pp. 133-155; Innis and Gelfand, “Optimization of PCRs”in: PCR Protocols, A Guide to Methods and Applications, CRC Press,London, 1994, pp. 5-11; and Plasterer, T. N. Primerselect: Primer andprobe design. Methods MoI. Biol. 70:520-527 (1997), the entiredisclosures of which are hereby expressly incorporated by reference.

Table A provides further information concerning the primer, probe, andamplicon sequences associated with the Examples disclosed herein.

MassARRAY® System

In MassARRAY-based methods, such as the exemplary method developed bySequenom, Inc. (San Diego, Calif.) following the isolation of RNA andreverse transcription, the obtained cDNA is spiked with a synthetic DNAmolecule (competitor), which matches the targeted cDNA region in allpositions, except a single base, and serves as an internal standard. ThecDNA/competitor mixture is PCR amplified and is subjected to a post-PCRshrimp alkaline phosphatase (SAP) enzyme treatment, which results in thedephosphorylation of the remaining nucleotides. After inactivarion ofthe alkaline phosphatase, the PCR products from the competitor and cDNAare subjected to primer extension, which generates distinct mass signalsfor the competitor- and cDNA-derives PCR products. After purification,these products are dispensed on a chip array, which is pre-loaded withcomponents needed for analysis with matrix-assisted laser desorptionionization time-of-flight mass spectrometry (MALDI-TOF MS) analysis. ThecDNA present in the reaction is then quantified by analyzing the ratiosof the peak areas in the mass spectrum generated. For further detailssee, e.g. Ding and Cantor, Proc. Natl. Acad. Sci. USA 100:3059-3064(2003).

Other PCR-Based Methods

Further PCR-based techniques that can find use in the methods disclosedherein include, for example, BeadArray® technology (Illumina, San Diego,Calif.; Oliphant et al., Discovery of Markers for Disease (Supplement toBiotechniques), June 2002; Ferguson et al., Analytical Chemistry 72:5618(2000)); BeadsArray for Detection of Gene Expression® (BADGE), using thecommercially available LuminexlOO LabMAP® system and multiplecolor-coded microspheres (Luminex Corp., Austin, Tex.) in a rapid assayfor gene expression (Yang et al., Genome Res. 11:1888-1898 (2001)); andhigh coverage expression profiling (HiCEP) analysis (Fukumura et al.,Nucl. Acids. Res. 31(16) e94 (2003).

Microarrays

Expression levels of a gene or microArray of interest can also beassessed using the microarray technique. In this method, polynucleotidesequences of interest (including cDNAs and oligonucleotides) are arrayedon a substrate. The arrayed sequences are then contacted underconditions suitable for specific hybridization with detectably labeledcDNA generated from RNA of a test sample. As in the RT-PCR method, thesource of RNA typically is total RNA isolated from a tumor sample, andoptionally from normal tissue of the same patient as an internal controlor cell lines. RNA can be extracted, for example, from frozen orarchived paraffin-embedded and fixed (e.g. formalin-fixed) tissuesamples.

For example, PCR amplified inserts of cDNA clones of a gene to beassayed are applied to a substrate in a dense array. Usually at least10,000 nucleotide sequences are applied to the substrate. For example,the microarrayed genes, immobilized on the microchip at 10,000 elementseach, are suitable for hybridization under stringent conditions.Fluorescently labeled cDNA probes may be generated through incorporationof fluorescent nucleotides by reverse transcription of RNA extractedfrom tissues of interest. Labeled cDNA probes applied to the chiphybridize with specificity to each spot of DNA on the array. Afterwashing under stringent conditions to remove non-specifically boundprobes, the chip is scanned by confocal laser microscopy or by anotherdetection method, such as a CCD camera. Quantitation of hybridization ofeach arrayed element allows for assessment of corresponding RNAabundance.

With dual color fluorescence, separately labeled cDNA probes generatedfrom two sources of RNA are hybridized pair wise to the array. Therelative abundance of the transcripts from the two sources correspondingto each specified gene is thus determined simultaneously. Theminiaturized scale of the hybridization affords a convenient and rapidevaluation of the expression pattern for large numbers of genes. Suchmethods have been shown to have the sensitivity required to detect raretranscripts, which are expressed at a few copies per cell, and toreproducibly detect at least approximately two-fold differences in theexpression levels (Schena et at, Proc. Natl. Acad. ScL USA 93(2):106-149(1996)). Microarray analysis can be performed by commercially availableequipment, following manufacturer's protocols, such as by using theAffymetrix GenChip® technology, or Incyte's microarray technology.

Serial Analysis of Gene Expression (SAGE)

Serial analysis of gene expression (SAGE) is a method that allows thesimultaneous and quantitative analysis of a large number of genetranscripts, without the need of providing an individual hybridizationprobe for each transcript. First, a short sequence tag (about 10-14 bp)is generated that contains sufficient information to uniquely identify atranscript, provided that the tag is obtained from a unique positionwithin each transcript. Then, many transcripts are linked together toform long serial molecules, that can be sequenced, revealing theidentity of the multiple tags simultaneously. The expression pattern ofany population of transcripts can be quantitatively evaluated bydetermining the abundance of individual tags, and identifying the genecorresponding to each tag. For more details see, e.g. Velculescu et al.,Science 270:484-487 (1995); and Velculescu et al., Cell 88:243-51(1997).

Gene Expression Analysis by Nucleic Acid Sequencing

Nucleic acid sequencing technologies are suitable methods for analysisof gene expression. The principle underlying these methods is that thenumber of times a cDNA sequence is detected in a sample is directlyrelated to the relative expression of the RNA corresponding to thatsequence. These methods are sometimes referred to by the term DigitalGene Expression (DGE) to reflect the discrete numeric property of theresulting data. Early methods applying this principle were SerialAnalysis of Gene Expression (SAGE) and Massively Parallel SignatureSequencing (MPSS). See, e.g., S. Brenner, et al., Nature Biotechnology18(6):630-634 (2000). More recently, the advent of “next-generation”sequencing technologies has made DGE simpler, higher throughput, andmore affordable. As a result, more laboratories are able to utilize DGEto screen the expression of more genes in more individual patientsamples than previously possible. See, e.g., J. Marioni, Genome Research18(9):1509-1517 (2008); R. Morin, Genome Research 18(4):610-621 (2008);A. Mortazavi, Nature Methods 5(7):621-628 (2008); N. Cloonan, NatureMethods 5(7):613-619 (2008).

Isolating RNA from Body Fluids

Methods of isolating RNA for expression analysis from blood, plasma andserum (see, e.g., K. Enders, et al., Clin Chem 48, 1647-53 (2002) (andreferences cited therein) and from urine (see, e.g., R. Boom, et al., JClin Microbiol. 28, 495-503 (1990) and references cited therein) havebeen described.

Immunohistochemistry

Immunohistochemistry methods are also suitable for detecting theexpression levels of genes and applied to the method disclosed herein.Antibodies (e.g., monoclonal antibodies) that specifically bind a geneproduct of a gene of interest can be used in such methods. Theantibodies can be detected by direct labeling of the antibodiesthemselves, for example, with radioactive labels, fluorescent labels,hapten' labels such as, biotin, or an enzyme such as horse radishperoxidase or alkaline phosphatase. Alternatively, unlabeled primaryantibody can be used in conjunction with a labeled secondary antibodyspecific for the primary antibody. Immunohistochemistry protocols andkits are well known in the art and are commercially available.

Proteomics

The term “proteome” is defined as the totality of the proteins presentin a sample (e.g. tissue, organism, or cell culture) at a certain pointof time. Proteomics includes, among other things, study of the globalchanges of protein expression in a sample (also referred to as“expression proteomics”). Proteomics typically includes the followingsteps: (1) separation of individual proteins in a sample by 2-D gelelectrophoresis (2-D PAGE); (2) identification of the individualproteins recovered from the gel, e.g. my mass spectrometry or N-terminalsequencing, and (3) analysis of the data using bioinformatics.

General Description of the mRNA Isolation, Purification andAmplification

The steps of a representative protocol for profiling gene expressionusing fixed, paraffin-embedded tissues as the RNA source, including mRNAisolation, purification, primer extension and amplification are providedin various published journal articles. (See, e.g., T. E. Godfrey, etal., J. Molec. Diagnostics 2: 84-91 (2000); K. Specht et al., Am. J.Pathol. 158: 419-29 (2001), M. Cronin, et al., Am J Pathol 164:35-42(2004)). Briefly, a representative process starts with cutting a tissuesample section (e.g. about 10 μm thick sections of a paraffin-embeddedtumor tissue sample). The RNA is then extracted, and protein and DNA areremoved. After analysis of the RNA concentration, RNA repair isperformed if desired. The sample can then be subjected to analysis,e.g., by reverse transcribed using gene specific promoters followed byRT-PCR.

Statistical Analysis of Expression Levels in Identification of Genes

One skilled in the art will recognize that there are many statisticalmethods that may be used to determine whether there is a significantrelationship between a clinical outcome of interest (e.g., recurrence)and expression levels of a marker gene as described here. In anexemplary embodiment, the present invention includes three studies. Thefirst study is a stratified cohort sampling design (a form ofcase-control sampling) using tissue and data from prostate cancerpatients. Selection of specimens was stratified by clinical T-stage (T1,T2), year of surgery (<1993, ≧1993), and prostatectomy Gleason Score(low/intermediate, high). All patients with clinical recurrence wereselected and a stratified random sample of patients who did notexperience a clinical recurrence was selected. For each patient, up totwo enriched tumor specimens and one normal-appearing tissue sample wereassayed. The second study used a subset of 70 patients from the firststudy from whom matched prostate biopsy tumor tissue was assayed. Thethird study includes all patients (170 evaluable patients) who hadsurgery for their prostate cancer between 1999 and 2010 at the ClevelandClinic (CC) and had Low or Intermediate risk (by AUA) clinicallylocalized prostate cancer who might have been reasonable candidates foractive surveillance but who underwent RP at CC within 6 months of thediagnosis of prostate cancer by biopsy. Biopsy tumor tissue from thesepatients was assayed.

All hypothesis tests were reported using two-sided p-values. Toinvestigate if there is a significant relationship of outcomes (egclinical recurrence-free interval (cRFI), biochemical recurrence-freeinterval (bRFI), prostate cancer-specific survival (PCSS), overallsurvival (OS)) with individual genes, and demographic or clinicalcovariates), Cox Proportional Hazards (PH) models using maximum weightedpseudo partial-likelihood estimators were used and p-values from Waldtests of the null hypothesis that the hazard ratio (HR) is one arereported. To investigate if there is a significant relationship betweenindividual genes and Gleason pattern of a particular sample, ordinallogistic regression models using maximum weighted pseudolikelihoodmethods were used and p-values from Wald tests of the null hypothesisthat the odds ratio (OR) is one are reported. To investigate if there isa significant relationship between individual genes and upgrading and/orupstaging or adverse pathology at RP, logistic regression models usingmaximum weighted pseudolikelihood methods were used and p-values fromWald tests of the null hypothesis that the odds ratio (OR) is one arereported.

Coexpression Analysis

In an exemplary embodiment, the joint correlation of gene expressionlevels among prostate cancer specimens under study may be assessed. Forthis purpose, the correlation structures among genes and specimens maybe examined through hierarchical cluster methods. This information maybe used to confirm that genes that are known to be highly correlated inprostate cancer specimens cluster together as expected. Only genesexhibiting a nominally significant (unadjusted p<0.05) relationship withcRFI in the univariate Cox PH regression analysis are included in theseanalyses.

One skilled in the art will recognize that many co-expression analysismethods now known or later developed will fall within the scope andspirit of the present invention. These methods may incorporate, forexample, correlation coefficients, co-expression network analysis,clique analysis, etc., and may be based on expression data from RT-PCR,microarrays, sequencing, and other similar technologies. For example,gene expression clusters can be identified using pair-wise analysis ofcorrelation based on Pearson or Spearman correlation coefficients. (See,e.g., Pearson K. and Lee A., Biometrika 2, 357 (1902); C. Spearman,Amer. J. Psychol 15:72-101 (1904); J. Myers, A. Well, Research Designand Statistical Analysis, p. 508 (2nd Ed., 2003).) An exemplary methodfor identifying co-expressed genes is described in Example 3 below.

Normalization of Expression Levels

The expression data used in the methods disclosed herein can benormalized. Normalization refers to a process to correct for (normalizeaway), for example, differences in the amount of RNA assayed andvariability in the quality of the RNA used, to remove unwanted sourcesof systematic variation in Ct or Cp measurements, and the like. Withrespect to RT-PCR experiments involving archived fixed paraffin embeddedtissue samples, sources of systematic variation are known to include thedegree of RNA degradation relative to the age of the patient sample andthe type of fixative used to store the sample. Other sources ofsystematic variation are attributable to laboratory processingconditions.

Assays can provide for normalization by incorporating the expression ofcertain normalizing genes, which do not significantly differ inexpression levels under the relevant conditions. Exemplary normalizationgenes disclosed herein include housekeeping genes. (See, e.g., E.Eisenberg, et al., Trends in Genetics 19(7):362-365 (2003).)Normalization can be based on the mean or median signal (Ct or Cp) ofall of the assayed genes or a large subset thereof (global normalizationapproach). In general, the normalizing genes, also referred to asreference genes, are typically genes that are known not to exhibitmeaningfully different expression in prostate cancer as compared tonon-cancerous prostate tissue, and track with various sample and processconditions, thus provide for normalizing away extraneous effects.

In exemplary embodiments, one or more of the following genes are used asreferences by which the mRNA expression data is normalized: AAMP, ARF1,ATP5E, CLTC, GPS 1, and PGK1. The calibrated weighted average C_(T) orCp measurements for each of the prognostic and predictive genes may benormalized relative to the mean of five or more reference genes.

Those skilled in the art will recognize that normalization may beachieved in numerous ways, and the techniques described above areintended only to be exemplary, not exhaustive.

Standardization of Expression Levels

The expression data used in the methods disclosed herein can bestandardized. Standardization refers to a process to effectively put allthe genes on a comparable scale. This is performed because some geneswill exhibit more variation (a broader range of expression) than others.Standardization is performed by dividing each expression value by itsstandard deviation across all samples for that gene. Hazard ratios arethen interpreted as the proportional change in the hazard for theclinical endpoint (clinical recurrence, biological recurrence, death dueto prostate cancer, or death due to any cause) per 1 standard deviationincrease in expression.

Kits of the Invention

The materials for use in the methods of the present invention are suitedfor preparation of kits produced in accordance with well-knownprocedures. The present disclosure thus provides kits comprising agents,which may include gene-specific or gene-selective probes and/or primers,for quantifying the expression of the disclosed genes for predictingprognostic outcome or response to treatment. Such kits may optionallycontain reagents for the extraction of RNA from tumor samples, inparticular fixed paraffin-embedded tissue samples and/or reagents forRNA amplification. In addition, the kits may optionally comprise thereagent(s) with an identifying description or label or instructionsrelating to their use in the methods of the present invention. The kitsmay comprise containers (including microliter plates suitable for use inan automated implementation of the method), each with one or more of thevarious materials or reagents (typically in concentrated form) utilizedin the methods, including, for example, chromatographic columns,pre-fabricated microarrays, buffers, the appropriate nucleotidetriphosphates (e.g., dATP, dCTP, dGTP and dTTP; or rATP, rCTP, rGTP andUTP), reverse transcriptase, DNA polymerase, RNA polymerase, and one ormore probes and primers of the present invention (e.g., appropriatelength poly(T) or random primers linked to a promoter reactive with theRNA polymerase). Mathematical algorithms used to estimate or quantifyprognostic or predictive information are also properly potentialcomponents of kits.

Reports

The methods of this invention, when practiced for commercial diagnosticpurposes, generally produce a report or summary of information obtainedfrom the herein-described methods. For example, a report may includeinformation concerning expression levels of one or more genes,classification of the tumor or the patient's risk of recurrence, thepatient's likely prognosis or risk classification, clinical andpathologic factors, and/or other information. The methods and reports ofthis invention can further include storing the report in a database. Themethod can create a record in a database for the subject and populatethe record with data. The report may be a paper report, an auditoryreport, or an electronic record. The report may be displayed and/orstored on a computing device (e.g., handheld device, desktop computer,smart device, website, etc.). It is contemplated that the report isprovided to a physician and/or the patient. The receiving of the reportcan further include establishing a network connection to a servercomputer that includes the data and report and requesting the data andreport from the server computer.

Computer Program

The values from the assays described above, such as expression data, canbe calculated and stored manually. Alternatively, the above-describedsteps can be completely or partially performed by a computer programproduct. The present invention thus provides a computer program productincluding a computer readable storage medium having a computer programstored on it. The program can, when read by a computer, execute relevantcalculations based on values obtained from analysis of one or morebiological samples from an individual (e.g., gene expression levels,normalization, standardization, thresholding, and conversion of valuesfrom assays to a score and/or text or graphical depiction of tumor stageand related information). The computer program product has storedtherein a computer program for performing the calculation.

The present disclosure provides systems for executing the programdescribed above, which system generally includes: a) a central computingenvironment; b) an input device, operatively connected to the computingenvironment, to receive patient data, wherein the patient data caninclude, for example, expression level or other value obtained from anassay using a biological sample from the patient, or microarray data, asdescribed in detail above; c) an output device, connected to thecomputing environment, to provide information to a user (e.g., medicalpersonnel); and d) an algorithm executed by the central computingenvironment (e.g., a processor), where the algorithm is executed basedon the data received by the input device, and wherein the algorithmcalculates an expression score, thresholding, or other functionsdescribed herein. The methods provided by the present invention may alsobe automated in whole or in part.

Having described the invention, the same will be more readily understoodthrough reference to the following Examples, which are provided by wayof illustration, and are not intended to limit the invention in any way.

EXAMPLES Example 1 Selection of 81 Genes for Algorithm Development

A gene identification study to identify genes associated with clinicalrecurrence, biochemical recurrence and/or death from prostate cancer isdescribed in U.S. Provisional Application Nos. 61/368,217, filed Jul.27, 2010; 61/414,310, filed Nov. 16, 2010; and 61/485,536, filed May 12,2011, and in U.S. Pub. No. 20120028264, filed Jul. 25, 2011, andpublished Feb. 2, 2012 (all of which are hereby incorporated byreference). RT-PCR analysis was used to determine RNA expression levelsfor 732 genes and reference genes in prostate cancer tissue andsurrounding normal appearing tissue (NAT) in patients with early-stageprostate cancer treated with radical prostatectomy. Genes significantlyassociated (p<0.05) with clinical recurrence-free interval (cRFI),biochemical recurrence-free interval (bRFI), prostate cancer-specificsurvival (PCSS), and upgrading/upstaging were determined.

From the genes that were identified as being associated with outcome, 81genes were selected for subsequent algorithm development. The primers,probes, and amplicon sequences of the 81 genes (and 5 reference genes)are listed in Table A. The genes selected were among the most prognosticwith respect to cRFI and other properties and shown in Tables 1A-1B.Other properties considered were: 1) Strongest genes with respect to theregression to the mean corrected standardized hazard ratio for theassociation of gene expression and cRFI in the primary Gleason patterntumor; 2) Consistency in association (hazard ratio) with cRFI using thehighest Gleason pattern tumor; 3) Associated with prostate-cancerspecific survival (PCSS); 4) Strong hazard ratio after adjustment forThe University of San Francisco Cancer of the Prostate Risk Assessment(CAPRA) (Cooperberg et al., J. Urol. 173:1983-1942, 2005); 5)Statistically significant odds ratio for the association between geneexpression and surgical Gleason pattern of the tumor; 6) Large overallvariability with greater between-patient variability than within-patientvariability preferable; and 7) Highly expressed.

The true discovery rate degree of association (TDRDA) method (Crager,Stat Med. 2010 Jan. 15; 29(1):33-45.) was used in the analysis of geneexpression and cRFI and results are shown in Table 1A. The truediscovery rate is the counterpart to the false discovery rate.Univariate Cox PH regression models were fit and the TDRDA method wasused to correct estimated standardized hazard ratios for regression tothe mean (RM) and and assess false discovery rates for identification ofgenes with absolute standardized hazard ratio of at least a specifiedlevel. The false discovery rates were controlled at 10%. The TDRDAmethod identifies sets of genes among which a specified proportion areexpected to have an absolute association (here, the absolutestandardized hazard ratio) of a specified degree or more. This leads toa gene ranking method that uses the maximum lower bound (MLB) degree ofassociation for which each gene belongs to a TDRDA set. Estimates ofeach gene's actual degree of association with approximate correction for“selection bias” due to regression to the mean can be derived usingsimple bivariate normal theory and Efron and Tibshirani's empiricalBayes approach. Efron, Annals of Applied Statistics 2:197-223 (2008);Efron and Tibshirani. Genetic Epidemiology 23: 70-86. Table 1A shows theRM-corrected estimate of the standardized hazard ratio and the MLB foreach gene using either the primary Gleason pattern (PGP) or highestGleason pattern (HGP) sample gene expression. Genes marked with adirection of association of −1 are associated with a reduced likelihoodof clinical recurrence, while those marked with a direction ofassociation of 1 are associated with an increased likelihood of clinicalrecurrence.

Within patient and between patient variance components were estimatedusing a mixed model treating the patient effect as random. The overallmean and standard deviation of normalized gene expression as well aswithin- and between-patient components of variance are shown in Table1A.

Univariate Cox PH regression models using maximum weighted partialpseudolikelihood estimation were used to estimate the associationbetween gene expression and prostate cancer specific-survival (PCSS).The standardized hazard ratio (HR), p-value and q-value using Storey'sFDR method are reported in Table 1B. Storey, Journal of the RoyalStatistical Society, Series B 64:479-498 (2002). The q-value can beinterpreted as the empirical Bayes posterior probability given the datathat the gene identified is a false discovery, that is, the probabilitythat it has no association with clinical recurrence.

Univariate ordinal logistic regression models were used to estimate theassociation between gene expression and the Gleason pattern of theprimary Gleason pattern tumor (3, 4, 5). The standardized odds ratio(OR), p-value and q-value using Storey's FDR method are reported inTable 1B.

FIG. 1 shows an example of a dendrogram depicting the association of the81 genes. The y-axis corresponds to the average distance betweenclusters measured as 1-Pearson r. The smaller the number (distancemeasure), the more highly correlated the genes. The amalgamation methodis weighted pair-group average. Genes that were co-expressed wereidentified from the dendrogram and are grouped into gene groups. Basedon FIG. 1, the genes from the Gene Identification study were formed intothe following gene groups or subsets:

Cellular organization gene group (BIN1; IGF1; C7; GSN; DES; TGFB1I1;TPM2; VCL; FLNC; ITGA7; COL6A1; PPP1R12A; GSTM1; GSTM2; PAGE4; PPAP2B;SRD5A2; PRKCA; IGFBP6; GPM6B; OLFML3; HLF)

Basal epithelia gene group (CYP3A5; KRT15; KRT5; LAMB3; SDC1)

Stress response gene group (DUSP1; EGR1; FOS; JUN; EGR3; GADD45B; ZFP36)

Androgen gene group (FAM13C; KLK2; AZGP1; SRD5A2)

Stromal gene group (ASPN; SFRP4; BGN; THBS2; INHBA; COL1A1; COL3A1;COL1A2; SPARC; COL8A1; COL4A1; FN1; FAP; COL5A2)

Proliferation gene group (CDC20; TPX2; UBE2T; MYBL2; CDKN2C)

TABLE 1A Association with cR Association with cR in PGP sample in HGPsample Absolute Direction Absolute Mean SD Between- Within- RM of RMnormalized normalized patient patient Total Direction of CorrectedAssocia- Corrected GENE cp CP variance variance variance Association HRMLB tion HR MLB ARF1 11.656805 0.2475574 0.04488 0.01646 0.0613399 −11.0849132 . −1 1.2006385 1.0397705 ATP5E 10.896515 0.2667133 0.051410.01979 0.0711992 1 1.2599111 1.0908967 1 1.3877428 1.165325 CLTC10.597008 0.1780634 0.01554 0.01619 0.0317257 1 1.0896793 1.002002 11.2292343 1.0554846 GPS1 9.2927019 0.2169116 0.03181 0.01528 0.0470897 11.0064191 . −1 1.1089053 . PGK1 8.3680642 0.2655009 0.04957 0.020990.0705517 1 1.1174152 1.0151131 1 1.2149168 1.0607752 ASPN 5.48460811.1701993 0.4981 0.8719 1.3699944 1 1.7716283 1.4276075 1 1.71145641.4007391 BGN 11.299746 0.7357491 0.3259 0.2159 0.5417276 1 1.61330661.3271052 1 1.7312777 1.4007391 COL1A1 11.325411 0.8840402 0.4748 0.30730.7821127 1 1.6162028 1.3284329 1 1.7982985 1.4304656 COL1A2 10.0930550.8232027 0.4321 0.2461 0.6781941 1 1.1319748 1.0222438 1 1.34493621.136553 COL3A1 11.007109 0.7944239 0.352 0.2795 0.6315424 1 1.56952551.2969301 1 1.7133767 1.4007391 COL4A1 7.8408647 0.6731713 0.2393 0.21420.4534598 1 1.3297169 1.14225 1 1.4292283 1.1996142 COL5A2 5.27085740.9571692 0.4 0.5166 0.9166661 1 1.1715343 1.0408108 1 1.18225681.0408108 F2R 7.0775127 1.0110657 0.5529 0.47 1.022934 1 1.50198151.2636445 1 1.4888813 1.2361479 FAP 5.0493366 1.1898915 0.6577 0.7591.4166546 1 1.3007869 1.1162781 1 1.3882726 1.1641602 FN1 9.51764380.7224014 0.3059 0.2163 0.5222401 1 1.0505668 . 1 1.1524617 1.0253151INHBA 5.8059993 1.2653019 0.9629 0.6392 1.6021763 1 1.896185 1.4858693 12.1859455 1.7177237 SFRP4 7.8225007 1.2053184 0.7997 0.6541 1.4537934 11.5382115 1.2763443 1 1.5692525 1.2969301 SPARC 10.544556 0.79788560.4311 0.206 0.6371517 1 1.3683299 1.1711662 1 1.6187451 1.3324242 THBS24.7779897 1.0825934 0.7121 0.4608 1.1728864 1 1.5523887 1.2904616 11.6829249 1.3785056 BIN1 7.8741434 0.840604 0.4445 0.2627 0.7071728 −11.5631385 1.2930451 −1 1.3294226 1.1185129 C7 8.4895479 1.1083704 0.69170.5377 1.2293358 −1 1.5658393 1.2687092 −1 1.4724885 1.220182 COL6A17.3615421 0.848837 0.3381 0.3828 0.7209474 −1 1.5439152 1.2687092 −11.2634411 1.0746553 DES 11.967287 0.896286 0.4101 0.3938 0.8038418 −11.5007183 1.2386227 −1 1.3032825 1.0963648 FLNC 8.6795128 1.06795280.572 0.5692 1.1412391 −1 1.2696693 1.0650268 −1 1.2353942 1.0491707GPM6B 7.9402089 0.9453416 0.4441 0.4501 0.8942266 −1 1.4471085 1.2056273−1 1.4931412 1.2386227 GSN 8.8308175 0.7789199 0.3756 0.2316 0.6071864−1 1.6223835 1.3073471 −1 1.3639212 1.136553 GSTM1 6.4175398 1.27203651.0519 0.5675 1.619374 −1 1.5226009 1.2560853 −1 1.5822193 1.2969301GSTM2 7.2950478 1.0014875 0.5069 0.4967 1.0036007 −1 1.6694102 1.3284329−1 1.4815895 1.220182 HLF 5.0774106 0.9618562 0.4516 0.4741 0.9257242 −11.6225351 1.2956338 −1 1.5817614 1.2891718 IGF1 7.6180418 1.14419450.7925 0.5177 1.3101729 −1 1.4732764 1.2165269 −1 1.6861196 1.359341IGFBP6 7.0089783 1.0816262 0.6302 0.5405 1.1707038 −1 1.5534588 1.257342−1 1.1860594 1.0273678 ITGA7 7.3299653 0.8913845 0.4034 0.3916 0.7950725−1 1.5556326 1.2636445 −1 1.3587711 1.1308844 OLFML3 8.1932023 0.81890120.4 0.2711 0.6710997 −1 1.5254982 1.2649088 −1 1.3364894 1.1263699 PAGE47.406255 1.4889881 1.3023 0.9164 2.2187195 −1 1.6316984 1.2969301 −11.5178657 1.2435871 PPAP2B 8.8879191 0.7884647 0.3841 0.238 0.6221573 −11.5664582 1.2649088 −1 1.4703629 1.2068335 PPP1R12A 9.369152 0.50567350.1361 0.1198 0.2558759 −1 1.4273047 1.1711662 −1 1.3719705 1.1297541PRKCA 7.4654299 0.7936779 0.2984 0.3319 0.6302916 −1 1.4498244 1.1960207−1 1.2221961 1.0554846 SRD5A2 5.7878904 1.2691925 0.8776 0.73431.6119317 −1 1.8236528 1.4276075 −1 1.723879 1.4007391 VCL 8.97669790.720267 0.2667 0.2524 0.5191126 −1 1.526093 1.2423441 −1 1.40804331.1525766 TGFB1I1 8.2191469 0.7816114 0.3143 0.297 0.6113098 −11.4793989 1.2104595 −1 1.3608249 1.1229959 TPM2 11.83198 0.86260620.4118 0.3328 0.7446062 −1 1.5739312 1.2687092 −1 1.4709656 1.2116705TPX2 4.552336 1.0856094 0.4888 0.6903 1.1791549 1 1.6454619 1.3444702 11.886703 1.4829005 CDC20 3.5608774 0.7971167 0.1978 0.4378 0.6356438 11.431589 1.232445 1 1.568564 1.3271052 CDKN2C 3.5214932 0.6898842 0.1370.3391 0.476113 1 1.4751885 1.257342 1 1.6210275 1.3485096 MYBL23.7669784 0.981064 0.4061 0.5569 0.9629897 1 1.5421274 1.2969301 11.5553089 1.3086551 UBE2T 3.5015369 0.7220453 0.1927 0.3289 0.5215967 11.5156185 1.272521 1 1.2186754 1.0618365 CYP3A5 4.5549862 1.33657440.5786 1.2085 1.7871462 −1 1.851997 1.4276075 −1 1.8669598 1.4304656KRT15 8.1889409 1.8188542 1.3539 1.956 3.3098961 −1 1.4985779 1.2423441−1 1.7380356 1.4007391 KRT5 7.046586 1.528426 0.8388 1.4983 2.3371203 −11.4656121 1.2140963 −1 1.6810573 1.3702593 LAMB3 6.3566958 1.34513050.6358 1.1744 1.8101739 −1 1.435434 1.1996142 −1 1.5003356 1.2287532EGR1 12.925851 1.0521413 0.7737 0.3343 1.1079529 −1 1.5840317 1.2687092−1 1.5773643 1.2801791 FOS 12.383619 1.1226481 0.8869 0.3746 1.2614295−1 1.580311 1.2687092 −1 1.655218 1.3310925 GADD45B 8.8760029 1.20015350.893 0.5485 1.4414667 −1 1.5331503 1.2460767 −1 1.490302 1.220182 JUN11.184249 1.027684 0.8099 0.2473 1.0571304 −1 1.5816073 1.2649088 −11.5652799 1.2649088 ZFP36 12.472841 1.1025865 0.8095 0.4072 1.2167102 −11.5169178 1.2423441 −1 1.580749 1.2840254 DUSP1 11.51936 0.8297195 0.4390.25 0.6889744 −1 1.3986201 1.1502738 −1 1.40278 1.1525766 EGR39.7545461 1.3366461 1.2104 0.5778 1.7881414 −1 1.5114362 1.2349124 −11.4927024 1.2226248 FAM13C 7.4923611 0.9318455 0.58 0.2891 0.8690619 −11.678403 1.3716303 −1 1.7981441 1.4520843 KLK2 14.718412 0.66778110.3087 0.1376 0.4463114 −1 1.4443314 1.1936311 −1 1.4966995 1.2361479ALDH1A2 5.8436909 1.0515294 0.4984 0.6079 1.1063379 −1 1.53942321.2649088 −1 1.817606 1.4304656 AZGP1 9.3508493 1.4900096 1.6597 0.56252.2222058 −1 1.6202418 1.329762 −1 1.4723607 1.2398619 ANPEP 7.10803382.3433136 3.865 1.6309 5.4959682 −1 1.5469141 1.257342 −1 1.77879261.4035433 AR 8.3681098 0.5094853 0.1629 0.09684 0.2597757 −1 1.0685968 .−1 1.1146402 1.004008 BMP6 4.7641658 1.1079887 0.5639 0.6645 1.2283377 11.5010706 1.2649088 1 1.7140049 1.4007391 CD276 8.9879201 0.55743210.168 0.1429 0.3109408 1 1.459777 1.2398619 1 1.6372299 1.3525612 CD447.5506865 1.1157404 0.8611 0.3849 1.2459543 −1 1.6840393 1.3444702 −11.5715077 1.3008267 COL8A1 7.1807327 0.9924731 0.529 0.4566 0.9856651 11.4782028 1.251071 1 1.6302371 1.3337573 CSF1 5.604685 0.806399 0.28280.3678 0.6506271 −1 1.4727627 1.2092496 −1 1.3423853 1.1162781 SRC7.4585136 0.6681318 0.2723 0.1745 0.446735 −1 1.4782469 1.2435871 −11.5104712 1.2687092 CSRP1 12.967068 0.8434745 0.267 0.4448 0.711785 −11.2323596 1.0607752 −1 1.2436215 1.0639623 DPP4 6.5096496 1.2289450.9313 0.5802 1.5114514 −1 1.5818046 1.2930451 −1 1.539425 1.2788996TNFRSF10B 5.8731042 0.836448 0.3338 0.3663 0.7000558 −1 1.57595061.2687092 −1 1.4449471 1.2032184 ERG 7.2194906 2.2910776 4.1737 1.08055.2541704 1 1.0907212 1.0030045 1 1.0828357 . FAM107A 4.9863267 1.1280370.6041 0.6691 1.2732255 −1 1.7326232 1.3539145 −1 1.5773305 1.2801791IGFBP2 9.855496 0.7862553 0.4734 0.1454 0.6187898 −1 1.6368294 1.2982277−1 1.4976178 1.2287532 CADM1 7.5914375 0.7603875 0.3308 0.2478 0.5785959−1 1.6923754 1.3539145 −1 1.7428282 1.4035433 IL6ST 10.355178 0.55097140.1851 0.1187 0.3038017 −1 1.5709644 1.2687092 −1 1.470333 1.217744LGALS3 8.5926121 0.7862207 0.3442 0.2743 0.6185665 −1 1.50482151.2411024 −1 1.3574436 1.1320159 SMAD4 8.902039 0.4085988 0.1025 0.064560.1670791 −1 1.669624 1.329762 −1 1.6100555 1.3112751 NFAT5 9.22712970.5003181 0.1647 0.08587 0.2505217 −1 1.7299501 1.3539145 −1 1.58582921.2995265 SDC1 7.2405046 0.9486094 0.4431 0.4573 0.9004048 1 1.18784371.0502204 1 1.0110389 . SHMT2 7.3818144 0.5716985 0.1376 0.18940.3270085 1 1.5185392 1.2687092 1 1.494364 1.2448313 SLC22A3 8.83662851.3865065 1.3128 0.6112 1.9240432 −1 1.6100215 1.2930451 −1 1.65313411.3271052 STAT5B 7.443638 0.479107 0.09118 0.1385 0.2296576 −1 1.49321361.2435871 −1 1.4376605 1.1948253 MMP11 4.0974635 1.1790067 0.6512 0.73961.3908598 1 1.4849754 1.257342 1 1.3586058 1.1514246 TUBB2A 8.32478210.9300317 0.553 0.3126 0.8656511 −1 1.4310473 1.1699956 −1 1.45207751.1817543

TABLE 1B Association with Association with cRFI, Association withprimary PCSS Endpoint CAPRA Adjusted Gleason pattern Wald p- Storey q-Wald p- Storey q- Wald p- Storey q- GENE Std. HR value value Std. HRvalue value Std. OR value value ARF1 0.9976193 0.9892903 0.58420391.13407667 0.83255536 0.4298654 0.982273 0.907845 0.3887901 ATP5E1.7881176 0.0115817 0.0404889 2.78746082 0.07918786 0.0876892 1.35460070.0347604 0.0377781 CLTC 1.026701 0.8673167 0.5533326 1.954500350.43611016 0.2800117 1.293925 0.4450274 0.2418238 GPS1 0.8494280.3590422 0.3461617 1.04907052 0.93720193 0.4574281 0.8271353 0.18630380.1358186 PGK1 0.9801478 0.9057919 0.5657924 2.40228939 0.085422840.0920345 0.9775945 0.8943649 0.3866666 ASPN 3.0547166 1.85E−08 3.97E−061.98680704 3.75E−07 8.71E−06 2.6178187 3.32E−07 2.89E−06 BGN 2.63959722.34E−06 0.000144 2.65751131 3.29E−07 8.71E−06 2.5773715 1.19E−071.34E−06 COL1A1 2.5740243 1.16E−07 0.0000124 2.43783157 3.46E−093.73E−07 2.2304742 2.40E−06 0.0000138 COL1A2 1.6067045 0.01087780.0393063 1.30239373 0.14922155 0.1339808 1.1226733 0.4115421 0.2324622COL3A1 2.3815758 7.22E−06 0.0002217 2.52337548 6.28E−08 3.29E−061.9034844 0.0001218 0.0003527 COL4A1 1.9704368 0.0008215 0.0064652.14028481 0.00036078 0.0014267 1.0893045 0.5143164 0.2654536 COL5A21.9382474 0.0018079 0.0111059 1.20272948 0.23650362 0.1864588 1.12348160.4230095 0.2350912 F2R 2.1687429 0.000107 0.0016437 1.621961060.00161154 0.0046735 2.3718651 0.00000433 0.0000227 FAP 1.99320310.0015781 0.0098344 1.37558042 0.00581828 0.0124219 2.3624961 2.06E−072.16E−06 FN1 1.5366589 0.0242031 0.0667138 1.46470831 0.060047490.0723063 0.9406955 0.6630807 0.3149629 INHBA 3.0596839 1.07E−070.0000124 1.98607554 4.28E−09 3.73E−07 2.5487503 5.65E−06 0.0000273SFRP4 2.3836087 0.0000248 0.0005927 1.67750397 1.87E−05 0.00015152.6895594 6.50E−11 1.80E−09 SPARC 2.249132 0.0000652 0.00121921.81223384 0.00166372 0.0047457 1.4031045 0.0236369 0.0279596 THBS22.5760475 2.97E−07 0.0000256 1.89140939 7.54E−07 0.0000146 1.87046030.0000261 0.0000967 BIN1 0.6582912 0.0008269 0.006465 0.532153941.35E−07 4.02E−06 0.4346961 2.11E−09 4.27E−08 C7 0.5305767 4.96E−060.0001778 0.61660218 1.43E−05 0.0001297 0.3687542 3.26E−12 1.24E−10COL6A1 0.6814495 0.0146421 0.0470118 0.59821729 8.17E−05 0.00045850.4431674 2.65E−07 2.52E−06 DES 0.730098 0.0532354 0.1095273 0.613359770.00103514 0.0033666 0.3442483 2.78E−08 3.52E−07 FLNC 0.741509 0.03567140.0847442 0.85002893 0.23258502 0.1843726 0.3263553 4.74E−09 8.68E−08GPM6B 0.6663566 0.0048803 0.0233566 0.60320026 0.00006446 0.00039350.4814972 2.01E−06 0.0000117 GSN 0.6464018 0.0057152 0.02577360.45512167 1.29E−07 4.02E−06 0.463566 3.08E−07 2.84E−06 GSTM1 0.67208340.0063399 0.0269917 0.61970734 4.44E−06 0.000058 0.4449932 1.22E−092.65E−08 GSTM2 0.514483 0.0000907 0.0015002 0.52721579 4.50E−06 0.0000580.2939022 3.94E−13 2.00E−11 HLF 0.5812615 0.0004971 0.0047504 0.520960122.14E−06 0.0000338 0.4179279 8.18E−09 1.31E−07 IGF1 0.6118674 0.00017210.0022429 0.62158322 7.65E−06 0.0000807 0.3470943 2.24E−13 1.36E−11IGFBP6 0.5776972 0.003161 0.0172052 0.60163498 7.96E−05 0.00045430.4536368 1.44E−08 2.18E−07 ITGA7 0.6760378 0.0331669 0.08103270.54167205 2.10E−05 0.0001661 0.3682462 4.07E−10 1.03E−08 OLFML30.6460637 0.0011279 0.0080836 0.57873289 0.00001453 0.0001297 0.41545841.88E−08 2.64E−07 PAGE4 0.5182669 5.75E−06 0.0001903 0.66287751 2.46E−060.0000357 0.2677212 2.80E−17 8.51E−15 PPAP2B 0.5680087 0.00063710.0055913 0.45475585 6.81E−06 0.0000765 0.4140322 4.85E−09 8.68E−08PPP1R12A 0.6937407 0.0165382 0.0496415 0.46377868 0.00149833 0.00442620.4793933 0.0000198 0.0000772 PRKCA 0.6323113 0.0014455 0.00927680.52602482 6.68E−05 0.0004008 0.3682007 6.36E−09 1.07E−07 SRD5A20.4878954 2.93E−06 0.0001573 0.53197502 1.48E−09 2.58E−07 0.28488521.63E−14 1.45E−12 VCL 0.6936282 0.4610739 0.3983524 0.489206910.00010426 0.0005415 0.4393103 7.90E−06 0.0000369 TGEB1I1 0.67004910.0291804 0.0749233 0.58638267 0.00360838 0.0089058 0.3711508 1.96E−082.64E−07 TPM2 0.6225776 0.0050533 0.0236188 0.55889674 0.000202860.0008936 0.3008674 1.04E−09 2.44E−08 TPX2 2.07392 0.0416277 0.09421011.80670715 7.56E−08 3.29E−06 2.1153062 4.98E−06 0.0000248 CDC201.7300441 0.0000725 0.0012988 1.94643748 4.81E−06 0.0000598 1.68581430.000045 0.0001519 CDKN2C 1.99305 0.0125796 0.0432737 2.2133602 1.45E−050.0001297 1.6207388 0.00039 0.0009879 MYBL2 1.7372773 0.012916 0.04407861.64874559 6.39E−06 0.0000745 1.4091306 0.0068668 0.0098934 UBE2T1.898363 0.0847937 0.1462198 1.87982565 0.00012433 0.0005929 1.5945930.0006065 0.0014404 CYP3A5 0.5067698 0.0003008 0.0032333 0.563122461.39E−07 4.02E−06 0.5204925 9.40E−06 0.0000426 KRT15 0.6841526 0.00523430.0242015 0.7687095 7.50E−05 0.000435 0.5603937 0.0000173 0.0000713 KRT50.6729884 0.0041853 0.0216827 0.7264908 0.0002245 0.0009645 0.64017110.000719 0.0016558 LAMB3 0.7403354 0.0266386 0.0724973 0.767220560.00366663 0.0089858 0.730357 0.0177696 0.0219488 EGR1 0.49022530.0003344 0.0034238 0.60204599 1.58E−05 0.0001377 0.5936819 0.00070560.0016501 FOS 0.5555161 0.0045741 0.0223508 0.6374611 5.34E−05 0.00033160.5788589 0.0002156 0.0005851 GADD45B 0.5541679 0.0021788 0.01249190.64836478 3.81E−05 0.0002707 0.6176176 0.002403 0.0043482 JUN 0.5059340.0013437 0.0088891 0.54410296 1.78E−05 0.00015 0.4795161 3.99E−073.37E−06 ZFP36 0.5757824 0.001207 0.0083714 0.66520845 0.000234820.0009966 0.6470667 0.0043499 0.0068874 DUSP1 0.6603212 0.04985180.1055975 0.63670824 0.00347613 0.0086407 0.586205 0.0007564 0.0017289EGR3 0.5613678 0.0009351 0.0071803 0.72134117 0.0008831 0.0029550.7186042 0.0260364 0.0300953 FAM13C 0.5260925 4.01E−09 1.72E−060.52541845 6.04E−10 2.10E−07 0.3709836 5.73E−11 1.74E−09 KLK2 0.58089230.0001686 0.0022429 0.56994638 0.00194411 0.0053697 0.5968229 0.00022890.0006158 ALDH1A2 0.5608861 0.0000177 0.0004751 0.65303146 0.000108060.000545 0.2822456 4.34E−07 3.57E−06 AZGP1 0.6167537 3.96E−06 0.00017780.6708779 4.29E−07 9.33E−06 0.5150783 1.17E−06 7.43E−06 ANPEP 0.53130850.0008229 0.006465 0.80509148 0.00011244 0.000559 0.6761885 0.00812510.0114354 AR 0.9479643 0.7435374 0.510526 0.77328075 0.35466409 0.2444580.9337631 0.6073904 0.295812 BMP6 1.4900185 0.0210638 0.0603831.44999312 0.00201534 0.0054792 2.3713254 6.85E−07 4.86E−06 CD2761.6684232 0.0049028 0.0233566 2.16998768 0.00057532 0.0020641 2.19552582.75E−06 0.0000155 CD44 0.6866428 0.0155666 0.0485038 0.5502693 1.06E−074.02E−06 0.7793905 0.0682278 0.0654298 COL8A1 2.2449967 0.00002710.0006141 1.91011656 4.90E−05 0.0003216 1.8755907 9.13E−06 0.0000421CSF1 0.6749873 0.0193771 0.0562984 0.44321491 5.01E−07 0.00001030.9573718 0.7643438 0.3461448 SRC 0.6670294 0.0040025 0.02124780.47320013 1.67E−06 0.0000307 0.766355 0.0813478 0.0758976 CSRP10.7112339 0.0067019 0.0277098 0.89239013 0.2633928 0.1994666 0.42487050.0058729 0.0088825 DPP4 0.5441442 1.12E−06 0.00008 0.689154550.00013991 0.0006492 0.4140282 1.92E−06 0.0000114 TNFRSF10B 0.68529250.0143692 0.0468086 0.53054603 4.04E−05 0.0002757 0.7430912 0.03041320.0339912 ERG 1.0765349 0.6794217 0.4926667 1.12737455 0.03493410.0496809 0.8943961 0.4148417 0.2324622 FAM107A 0.540565 0.00006050.0011827 0.57090059 6.82E−08 3.29E−06 0.3476335 1.99E−08 2.64E−07IGFBP2 0.6977969 0.0532257 0.1095273 0.45025927 6.42E−06 0.00007450.6063083 0.0001586 0.0004465 CADM1 0.6456383 0.0150546 0.04725180.40819615 3.14E−08 2.19E−06 0.5598139 0.0001184 0.000346 IL6ST0.5740052 0.0003647 0.0036466 0.33440325 2.36E−06 0.0000357 0.54629640.0040541 0.0065556 LGALS3 0.6782394 0.0071303 0.0283525 0.534068035.18E−05 0.0003278 0.5903729 0.0030449 0.0052894 SMAD4 0.52776284.87E−06 0.0001778 0.24376793 2.03E−06 0.0000336 0.3346823 1.85E−060.0000112 NFAT5 0.5361732 0.0000856 0.0014722 0.20926313 3.51E−078.71E−06 0.5518236 0.0000356 0.000126 SDC1 1.7097015 0.007187 0.02835251.43080815 0.02325931 0.0359744 1.6597668 0.0010445 0.002268 SHMT21.9491131 0.0031065 0.0171257 1.94514573 0.00591315 0.0124714 1.68960760.0074605 0.0105488 SLC22A3 0.5168636 0.000117 0.001706 0.654646787.32E−06 0.0000796 0.2293355 1.91E−14 1.45E−12 STAT5B 0.70021040.0396042 0.0914258 0.44673718 0.00045662 0.0017462 0.5417213 0.00004650.0001553 MMP11 1.8691119 0.0001041 0.0016437 1.62300343 1.23E−050.0001222 2.3250222 7.87E−07 5.44E−06 TUBB2A 0.6134538 0.00262350.0148438 0.56476388 1.81E−05 0.00015 0.9566513 0.7630842 0.3461448

Example 2 Algorithm Development Based on Data from a Companion Study

The Cleveland Clinic (“CC”) Companion study consists of three patientcohorts and separate analyses for each cohort as described in Table 2.The first cohort (Table 2) includes men with low to high risk (based onAUA criteria) prostate cancer from Gene ID study 09-002 who underwent RPat CC between 1987 and 2004 and had diagnostic biopsy tissue availableat CC. Cohorts 2 and 3 include men with clinically localized Low andIntermediate Risk (based on AUA criteria) prostate cancer, respectively,who might have been reasonable candidates for active surveillance butwho underwent radical prostatectomy (RP) within 6 months of thediagnosis of prostate cancer by biopsy. The main objective of Cohort 1was to compare the molecular profile from biopsy tissue with that fromradical prostatectomy tissue. The main objective of Cohorts 2 and 3 wasto develop a multigene predictor of upgrading/upstaging at RP usingbiopsy tissue in low to intermediate risk patients at diagnosis.

Matched biopsy samples were obtained for a subset of the patients (70patients) from the gene identification study. Gene expression of the 81selected genes and the 5 reference genes (ARF1, ATP5E, CLTC, GPS1, PGK1)were compared in the RP specimens and the biopsy tissue obtained fromthese 70 patients.

The 81 genes were evaluated in Cohorts 2 and 3 for association withupgrading and upstaging. The association between these 81 genes andupgrading and upstaging in Cohorts 2 and 3 are shown in Table 3. Pvalues and standardized odds ratio are provided.

In this context, “upgrade” refers to an increase in Gleason grade from3+3 or 3+4 at the time of biopsy to greater than or equal to 3+4 at thetime of RP. “Upgrade2” refers to an increase in Gleason grade from 3+3or 3+4 at the time of biopsy to greater than or equal to 4+3 at the timeof RP.

TABLE 2 Cohort # Cohort Description # of Patients Objectives 1 Subset ofpatients from Gene ID study 70 Comparison of gene expression from 09-002who underwent RP at CC between biopsy sample with gene expression 1987and 2004 and had diagnostic biopsy from RP specimen (Co-Primary tissueavailable at CC. Objective) Patients from the original stratified cohortExplore association of risk of sample with available biopsy tissueblocks recurrence after RP with gene expression from biopsy sample andgene expression from RP sample Explore association of risk of recurrenceafter RP with gene expression from RP samples 2 Low Risk Patients fromCC database of 92 Association between gene expression patients who werebiopsied, and then from biopsy sample and likelihood of underwent RP atCC between 1999 and upgrading/upstaging in tissue obtained 2010 atprostatectomy All patients in database who meet (Co-Primary Objective)minimum tumor tissue criteria 3 Intermediate Risk Patients from CC 75Association between gene expression database of patients who werebiopsied, from biopsy sample and likelihood of and then underwent RP atCC between upgrading/upstaging in tissue obtained 1999 and 2010 atprostatectomy All patients in database who meet minimum tumor tissuecriteria

Several different models were explored to compare expression between theRP and biopsy specimens. Genes were chosen based on consistency ofexpression between the RP and biopsy specimens. FIGS. 2A-2E are thescatter plots showing the comparison of normalized gene expression (Cp)for matched samples from each patient where the x-axis is the normalizedgene expression from the PGP RP sample (PGP) and the y-axis is thenormalized gene expression from the biopsy sample (BX). FIGS. 3A-3D showrange plots of gene expression of individual genes within each genegroup in the biopsy (BX) and PGP RP samples.

After evaluating the concordance of gene expression in biopsy and RPsamples, the following algorithms (RS models) shown in Table 4 weredeveloped where the weights are determined using non-standardized, butnormalized data. Some genes, such as SRD5A2 and GSTM2, which fall withinthe cellular organization gene group, were also evaluated separately andindependent coefficients were assigned (see the “other” category inTable 4). In other instances, GSTM1 and GSMT2 were grouped as anoxidative “stress” group and a coefficient was assigned to this “stress”group (see RS20 and RS22 models). Other genes, such as AZGP1 andSLC22A3, which did not fall within any of the gene groups, were alsoincluded in certain algorithms (see the “other” category in Table 4).Furthermore, the androgen gene group was established to include FAM13C,KLK2, AZGP1, and SRD5A2. Some genes such as BGN, SPARC, FLNC, GSN, TPX2and SRD5A2 were thresholded before being evaluated in models. Forexample, normalized expression values below 4.5 were set to 4.5 for TPX2and normalized expression values below 5.5 were set to 5.5 for SRD5A2.

TABLE 3 Association between the 81 genes and Upgrading and Upstaging inCohorts 2/3 p-value Std OR 95% CI p-value Std OR 95% CI p-value Std OR95% CI Gene N UpGrade Upgrade Upgrade Upgrade2 Upgrade2 Upgrade2 UpstageUpstage Upstage ALDH1A2 167 0.501 1.11 (0.82, 1.52) 0.932 1.02 (0.70,1.47) 0.388 0.86 (0.61, 1.22) ANPEP 167 0.054 1.36 (0.99, 1.87) 0.9330.98 (0.68, 1.42) 0.003 0.58 (0.40, 0.83) AR 167 0.136 1.27 (0.93, 1.74)0.245 0.81 (0.56, 1.16) 0.005 0.60 (0.42, 0.86) ARF1 167 0.914 0.98(0.72, 1.34) 0.051 1.45 (1.00, 2.11) 0.371 1.17 (0.83, 1.66) ASPN 1670.382 1.15 (0.84, 1.56) 0.040 1.60 (1.02, 2.51) 0.069 1.46 (0.97, 2.19)ATP5E 167 0.106 1.30 (0.95, 1.77) 0.499 0.88 (0.61, 1.27) 0.572 0.90(0.64, 1.28) AZGP1 167 0.192 1.23 (0.90, 1.68) 0.190 0.79 (0.55, 1.13)0.005 0.59 (0.41, 0.85) BGN 167 0.568 0.91 (0.67, 1.25) 0.001 2.15(1.39, 3.33) 0.020 1.56 (1.07, 2.28) BIN1 167 0.568 1.09 (0.80, 1.49)0.634 0.92 (0.64, 1.32) 0.104 0.75 (0.54, 1.06) BMP6 167 0.509 0.90(0.66, 1.23) 0.015 1.59 (1.09, 2.30) 0.650 1.08 (0.77, 1.54) C7 1670.677 1.07 (0.78, 1.46) 0.013 1.66 (1.11, 2.47) 0.223 0.80 (0.56, 1.14)CADM1 167 0.082 0.74 (0.52, 1.04) 0.235 0.81 (0.57, 1.15) 0.039 0.69(0.48, 0.98) CD276 167 0.454 0.89 (0.65, 1.21) 0.362 0.84 (0.58, 1.22)0.214 1.25 (0.88, 1.78) CD44 167 0.122 1.28 (0.94, 1.75) 0.305 1.23(0.83, 1.81) 0.876 0.97 (0.69, 1.38) CDC20 166 0.567 1.10 (0.80, 1.50)0.298 1.21 (0.84, 1.75) 0.279 1.21 (0.86, 1.71) CDKN2C 152 0.494 0.89(0.64, 1.24) 0.908 0.98 (0.67, 1.43) 0.834 1.04 (0.72, 1.49) CLTC 1670.102 0.76 (0.55, 1.06) 0.300 0.82 (0.57, 1.19) 0.264 0.82 (0.58, 1.16)COL1A1 167 0.732 1.06 (0.77, 1.44) 0.000 3.04 (1.93, 4.79) 0.006 1.65(1.15, 2.36) COL1A2 167 0.574 0.91 (0.67, 1.25) 0.017 1.65 (1.09, 2.50)0.521 0.89 (0.63, 1.26) COL3A1 167 0.719 0.94 (0.69, 1.29) 0.000 2.98(1.88, 4.71) 0.020 1.53 (1.07, 2.20) COL4A1 167 0.682 0.94 (0.69, 1.28)0.000 2.12 (1.39, 3.22) 0.762 0.95 (0.67, 1.35) COL5A2 167 0.499 1.11(0.82, 1.52) 0.009 1.81 (1.16, 2.83) 0.516 0.89 (0.63, 1.26) COL6A1 1670.878 0.98 (0.72, 1.33) 0.001 2.14 (1.37, 3.34) 0.883 1.03 (0.72, 1.46)COL8A1 165 0.415 0.88 (0.64, 1.20) 0.000 3.24 (1.88, 5.61) 0.044 1.51(1.01, 2.25) CSF1 167 0.879 1.02 (0.75, 1.40) 0.187 1.31 (0.88, 1.96)0.110 0.76 (0.54, 1.07) CSRP1 165 0.258 1.20 (0.87, 1.65) 0.226 1.26(0.87, 1.82) 0.641 0.92 (0.65, 1.31) CYP3A5 167 0.989 1.00 (0.73, 1.36)0.188 1.28 (0.88, 1.87) 0.937 1.01 (0.71, 1.44) DES 167 0.776 1.05(0.77, 1.43) 0.088 1.40 (0.95, 2.05) 0.242 0.81 (0.57, 1.15) DPP4 1670.479 0.89 (0.65, 1.22) 0.005 0.60 (0.42, 0.85) 0.000 0.51 (0.36, 0.74)DUSP1 167 0.295 0.84 (0.61, 1.16) 0.262 0.82 (0.58, 1.16) 0.427 0.87(0.62, 1.22) EGR1 167 0.685 0.94 (0.69, 1.28) 0.217 1.27 (0.87, 1.85)0.370 1.18 (0.83, 1.68) EGR3 166 0.025 0.69 (0.50, 0.95) 0.539 0.89(0.62, 1.29) 0.735 1.06 (0.75, 1.51) ERG 166 0.002 0.58 (0.42, 0.81)0.000 0.42 (0.28, 0.64) 0.768 1.05 (0.74, 1.50) F2R 160 0.324 0.85(0.62, 1.17) 0.009 1.77 (1.16, 2.70) 0.000 2.39 (1.52, 3.76) FAM107A 1430.832 1.04 (0.74, 1.45) 0.088 1.42 (0.95, 2.11) 0.687 1.08 (0.74, 1.58)FAM13C 167 0.546 1.10 (0.81, 1.50) 0.041 0.68 (0.47, 0.98) 0.003 0.58(0.40, 0.83) FAP 167 0.540 0.91 (0.67, 1.24) 0.093 1.37 (0.95, 1.97)0.001 1.85 (1.28, 2.68) FLNC 167 0.963 1.01 (0.74, 1.37) 0.254 1.26(0.85, 1.87) 0.030 0.68 (0.48, 0.96) FN1 167 0.530 0.91 (0.66, 1.23)0.005 1.73 (1.18, 2.53) 0.364 1.17 (0.83, 1.66) FOS 167 0.649 0.93(0.68, 1.27) 0.071 1.38 (0.97, 1.97) 0.015 1.53 (1.09, 2.16) GADD45B 1670.978 1.00 (0.73, 1.36) 0.105 1.38 (0.94, 2.04) 0.876 0.97 (0.69, 1.38)GPM6B 159 0.944 0.99 (0.72, 1.36) 0.002 1.95 (1.27, 2.97) 0.266 0.81(0.57, 1.17) GPS1 167 0.404 1.14 (0.84, 1.56) 0.609 0.91 (0.62, 1.32)0.125 1.31 (0.93, 1.86) GSN 167 0.272 0.84 (0.61, 1.15) 0.309 0.83(0.57, 1.19) 0.027 0.67 (0.47, 0.96) GSTM1 167 0.178 1.24 (0.91, 1.69)0.762 0.95 (0.66, 1.36) 0.000 0.50 (0.34, 0.72) GSTM2 167 0.145 1.26(0.92, 1.73) 0.053 1.48 (1.00, 2.20) 0.654 0.92 (0.65, 1.31) HLF 1670.979 1.00 (0.73, 1.36) 0.602 1.11 (0.76, 1.62) 0.030 0.69 (0.49, 0.96)IGF1 167 0.313 1.17 (0.86, 1.60) 0.878 0.97 (0.67, 1.40) 0.146 0.77(0.55, 1.09) IGFBP2 167 0.253 1.20 (0.88, 1.64) 0.493 0.88 (0.61, 1.27)0.051 0.70 (0.49, 1.00) IGFBP6 167 0.336 0.86 (0.62, 1.17) 0.510 1.14(0.78, 1.66) 0.204 0.80 (0.57, 1.13) IL6ST 167 0.774 1.05 (0.77, 1.43)0.541 1.12 (0.77, 1.63) 0.235 0.81 (0.57, 1.15) INHBA 167 0.104 1.30(0.95, 1.78) 0.002 1.89 (1.26, 2.84) 0.077 1.38 (0.97, 1.97) ITGA7 1670.990 1.00 (0.73, 1.36) 0.780 1.05 (0.73, 1.53) 0.470 0.88 (0.62, 1.25)JUN 167 0.586 1.09 (0.80, 1.48) 0.538 0.89 (0.62, 1.28) 0.259 0.82(0.59, 1.15) KLK2 167 0.267 0.84 (0.61, 1.15) 0.003 0.56 (0.38, 0.82)0.007 0.61 (0.42, 0.87) KRT15 167 0.500 0.90 (0.65, 1.23) 0.738 0.94(0.65, 1.35) 0.987 1.00 (0.71, 1.42) KRT5 152 0.834 0.97 (0.70, 1.34)0.632 1.10 (0.74, 1.63) 0.908 0.98 (0.68, 1.40) LAMB3 167 0.090 1.31(0.96, 1.79) 0.013 1.73 (1.12, 2.68) 0.132 1.33 (0.92, 1.94) LGALS3 1660.345 1.16 (0.85, 1.59) 0.405 1.18 (0.80, 1.72) 0.208 0.80 (0.57, 1.13)MMP11 167 0.715 1.06 (0.78, 1.45) 0.080 1.37 (0.96, 1.96) 0.257 1.22(0.87, 1.71) MYBL2 167 0.235 1.21 (0.88, 1.67) 0.868 1.03 (0.71, 1.49)0.266 1.21 (0.86, 1.70) NFAT5 167 0.514 0.90 (0.66, 1.23) 0.058 0.70(0.48, 1.01) 0.530 0.89 (0.63, 1.27) OLFML3 167 0.448 0.89 (0.65, 1.21)0.056 1.50 (0.99, 2.28) 0.129 0.77 (0.54, 1.08) PAGE4 167 0.914 0.98(0.72, 1.34) 0.211 0.80 (0.56, 1.14) 0.005 0.61 (0.43, 0.86) PGK1 1670.138 0.78 (0.56, 1.08) 0.666 0.92 (0.64, 1.33) 0.292 0.83 (0.59, 1.17)PPAP2B 167 0.952 0.99 (0.73, 1.35) 0.989 1.00 (0.69, 1.44) 0.221 0.80(0.56, 1.14) PPP1R12A 167 0.547 0.91 (0.66, 1.24) 0.563 0.90 (0.63,1.29) 0.001 0.55 (0.38, 0.79) PRKCA 167 0.337 1.17 (0.85, 1.59) 0.1411.35 (0.90, 2.03) 0.029 0.67 (0.46, 0.96) SDC1 167 0.064 1.36 (0.98,1.87) 0.013 1.83 (1.14, 2.96) 0.037 1.58 (1.03, 2.42) SFRP4 166 0.9861.00 (0.73, 1.37) 0.047 1.47 (1.01, 2.15) 0.031 1.49 (1.04, 2.14) SHMT2167 0.133 0.78 (0.56, 1.08) 0.147 0.77 (0.53, 1.10) 0.715 0.94 (0.66,1.33) SLC22A3 167 0.828 1.03 (0.76, 1.41) 0.044 0.69 (0.48, 0.99) 0.0500.71 (0.50, 1.00) SMAD4 167 0.165 1.25 (0.91, 1.71) 0.333 0.83 (0.58,1.21) 0.021 0.65 (0.45, 0.94) SPARC 167 0.810 0.96 (0.71, 1.31) 0.0002.15 (1.40, 3.30) 0.154 1.30 (0.91, 1.86) SRC 167 0.083 1.34 (0.96,1.86) 0.750 1.06 (0.72, 1.56) 0.550 0.90 (0.64, 1.26) SRD5A2 167 0.8620.97 (0.71, 1.33) 0.122 0.75 (0.53, 1.08) 0.010 0.63 (0.45, 0.90) STAT5B167 0.298 0.84 (0.62, 1.16) 0.515 0.89 (0.62, 1.27) 0.016 0.65 (0.46,0.92) TGFB1I1 167 0.985 1.00 (0.74, 1.37) 0.066 1.45 (0.98, 2.14) 0.1310.76 (0.54, 1.08) THBS2 167 0.415 1.14 (0.83, 1.56) 0.001 1.91 (1.30,2.80) 0.288 1.21 (0.85, 1.70) TNFRSF10B 167 0.214 1.22 (0.89, 1.66)0.805 0.95 (0.66, 1.38) 0.118 0.76 (0.54, 1.07) TPM2 167 0.996 1.00(0.73, 1.36) 0.527 1.13 (0.78, 1.64) 0.094 0.74 (0.52, 1.05) TPX2 1670.017 1.48 (1.07, 2.04) 0.002 1.89 (1.26, 2.83) 0.001 1.91 (1.30, 2.80)TUBB2A 167 0.941 0.99 (0.73, 1.35) 0.182 0.78 (0.54, 1.12) 0.111 0.75(0.53, 1.07) UBE2T 167 0.095 1.36 (0.95, 1.96) 0.009 1.58 (1.12, 2.23)0.084 1.33 (0.96, 1.84) VCL 167 0.954 0.99 (0.73, 1.35) 0.165 1.31(0.90, 1.91) 0.265 0.82 (0.57, 1.16) ZFP36 167 0.685 1.07 (0.78, 1.45)0.784 0.95 (0.66, 1.37) 0.610 0.91 (0.64, 1.29)

TABLE 4 RS Model ECM (Stromal Response) Migration (CellularOrganization) Prolif. Androgen (PSA) Other Algorithm RS0 (ASPN + (FLNC +GSN + GSTM2 + (TPX2 + (FAM13C + KLK2)/2 STAT5B, NFAT5 1.05 * ECM −0.58 * Migration − BGN + COL1A1 + SPARC)/4 IGFBP6 + PPAP2B + PPP1R12A)/6CDC20 + 0.30 * PSA + 0.08 * Prolif − MYBL2)/3 0.16 * STAT5B − 0.23 *NFAT5 RS1 (BGN + (FLNC + GSN + GSTM2 + PPAP2B + . (FAM13C + KLK2)/2STAT5B, NFAT5 1.15 * ECM − 0.72 * Migration − COL1A1 + FN1 + SPARC)/4PPP1R12A)/6 0.56 * PSA − 0.45 * STAT5B − 0.56 * NFAT5 RS2 (BGN +COL1A1 + FN1 + (BIN1 + FLNC + GSN + GSTM2 + . (FAM13C + KLK2)/2 STAT5B,NFAT5 1.16 * ECM − 0.75 * Migration − SPARC)/4 PPAP2B + PPP1R12A +VCL)/7 0.57 * PSA − 0.47 * STAT5B − 0.50 * NFAT5 RS3 (BGN + COL1A1 +COL3A1 + (FLNC + GSN + GSTM2 + PPAP2B + . (FAM13C + KLK2)/2 STAT5B,NFAT5 1.18 * ECM − 0.75 * Migration − COL4A1 + FN1 + SPARC)/6PPP1R12A)/5 0.56 * PSA − 0.40 * STAT5B − 0.48 * NFAT5 RS4 (BGN +COL1A1 + COL3A1 + (BIN1 + FLNC + GSN + GSTM2 + . (FAM13C + KLK2)/21.18 * ECM − 0.76 * Migration − COL4A1 + FN1 + SPARC)/6 PPAP2B +PP1R12A + VCL)/7 0.58 * PSA − 0.43 * STAT5B − 0.43 * NFAT5 RS5 (COL4A1(thresholded) + (BIN1 + IGF1 (thresholded) + . KLK2 AZGP1, ANPEP, 1.20 *ECM − 0.91 * Migration − INHBA + SPARC + THBS2)/4 VCL)/3 IGFBP2 0.29 *KLK2 − (thresholded) 0.14 * AZGP1 + 0.05 * ANPEP − 0.56 * IGFBP2 RS6(BGN + COL3A1 + INHBA + Migratn1: (FLNC + GSN + TPM2)/3 TPX2 (FAM13C +KLK2)/2 AZGP1, SLC22A3 1.09 * ECM − 0.44 * Migration1 − SPARC)/4Migratn2: (GSTM2 + PPAP2B)/2 0.23 * Migratn2 − 0.36 * PSA + 0.15 * TPX2− 0.16 * AZGP1 − 0.08 * SLC22A3 RS7 (BGN + COL3A1 + INHBA + Migratn1:(FLNC + GSN + TPM2)/3 . (FAM13C + KLK2)/2 AZGP1, SLC22A3 1.16 * ECM −0.53 * Migration1 − SPARC)/4 Migratn2: (GSTM2 + PPAP2B)/2 0.24 *Migratn2 − 0.42 * PSA − 0.14 * AZGP1 − 0.08 * SLC22A3 RS8 (BGN +COL3A1 + SPARC)/3 Migratn1: (FLNC + GSN + TPM2)/3 . KLK2 AZGP1, SLC22A31.37 * ECM − 0.56 * Migration1 − Migratn2: (GSTM2 + PPAP2B)/2 0.49 *Migratn2 − 0.52 * KLK2 − 0.16 * AZGP1 − 0.00 * SLC22A3 RS9 (BGNMigratn1: (FLNC (thresholded) + . (FAM13C + KLK2)/2 AZGP1, SLC22A31.28 * ECM − 1.11 * Migration1 − (thresholded) + COL3A1 + GSN(thresholded) + TPM2)/3 0.00 * Migratn2 − 0.34 * PSA − INHBA + SPARCMigratn2: (GSTM2 + PPAP2B)/2 0.16 * AZGP1 − 0.08 * SLC22A3(thresholded))/4 RS10 (BGN + COL3A1 + INHBA + (FLNC + GSN + GSTM2 +PPAP2B + TPX2 (FAM13C + KLK2)/2 AZGP1, SLC22A3 1.09 * ECM − 0.68 *Migration − SPARC)/4 TPM2)/5 0.37 * PSA + 0.16 * TPX2 − 0.16 * AZGP1 −0.08 * SLC22A3 RS11 (BGN (thresholded) + (FLNC(thresholded) + .(FAM13C + KLK2)/2 AZGP1, SLC22A3 1.19 * ECM − 0.96 * Migration −COL3A1 + INHBA + GSN(thresholded) + GSTM2 + 0.39 * PSA − 0.14 * AZGP1 −SPARC(thresholded))/4 PPAP2B + TPM2)/5 0.09 * SLC22A3 RS12 (BGN(thresholded) + (FLNC(thresholded) + TPX2 (FAM13C + KLK2)/2 AZGP1,SLC22A3 1.13 * ECM − 0.85 * Migration − COL3A1 + INHBA +GSN(thresholded) + GSTM2 + 0.34 * PSA + 0.15 * TPX2 −SPARC(thresholded))/4 PPAP2B + TPM2)/5 0.15 * AZGP1 − 0.08 * SLC22A3RS13 (BGN (thresholded) + (FLNC(thresholded) + TPX2 (FAM13C + KLK2)/2AZGP1, ERG, 1.12 * ECM − 0.83 * Migratn − COL3A1 + INHBA +GSN(thresholded) + GSTM2 + SLC22A3 0.33 * PSA + 0.17 * TPX2 −SPARC(thresholded))/4 PPAP2B + TPM2)/5 0.14 * AZGP1 + 0.04 * ERG −0.10 * SLC22A3 RS14 (BGN (thresholded) + (FLNC(thresholded) + TPX2(FAM13C + KLK2)/2 AR, AZGP1, ERG, 1.13 * ECM − 0.83 * Migration −COL3A1 + INHBA + GSN(thresholded) + GSTM2 + SLC22A3 0.35 * PSA + 0.16 *TPX2 + 0.15 * AR − SPARC(thresholded))/4 PPAP2B + TPM2)/5 0.15 * AZGP1 +0.03 * ERG − 0.10 * SLC22A3 RS15 (BGN (thresholded) +(FLNC(thresholded) + . KLK2 AR, ERG, SLC22A3 1.30 * ECM − 1.20 *Migration − COL3A1 + INHBA + GSN(thresholded) + GSTM2 + 0.52 * KLK2 +0.09 * AR + 0.05 * ERG − SPARC(thresholded))/4 PPAP2B + TPM2)/5 0.06 *SLC22A3 RS16 (BGN (thresholded) + (C7 + FLNC(thresholded) + . KLK2 AR,ERG, SLC22A3 1.23 * ECM − 1.02 * Migration − COL3A1 + INHBA +GSN(thresholded) + GSTM1)/4 0.46 * KLK2 + 0.09 * AR + 0.07 * ERG −SPARC(thresholded))/4 0.09 * SLC22A3 RS17 (BGN + COL1A1 + SFRP4)/3(FLNC + GSN + GSTM1 + TPM2)/4 TPX2 (FAM13C + KLK2)/2 AR, AZGP1, ERG,0.63 * ECM − 0.12 * Migration − SLC22A3, SRD5A2 0.44 * PSA + 0.19 * TPX2− 0.02 * AR − 0.15 * AZGP1 + 0.06 * ERG − 0.13 * SLC22A3 − 0.33 * SRD5A2RS18 (BGN + COL1A1 + SFRP4)/3 (FLNC + GSN + GSTM1 + TPM2)/4 TPX2(FAM13C + KLK2)/2 AR, ERG, 0.63 * ECM − 0.17 * Migration4 − SLC22A3,SRD5A2 0.52 * PSA + 0.19 * TPX2 − 0.07 * AR + 0.09 * ERG − 0.14 *SLC22A3 − 0.36 * SRD5A2 RS19 (BGN + COL1A1 + SFRP4)/3 (FLNC + GSN +GSTM1 + TPM2)/4 . (FAM13C + KLK2)/2 AR, AZGP1, ERG, 0.72 * ECM − 0.24 *Migration4 − SLC22A3, SRD5A2 0.51 * PSA + 0.03 * AR − 0.15 * AZGP1 +0.04 * ERG − 0.12 * SLC22A3 − 0.32 * SRD5A2 RS20 (BGN + COL1A1 +SFRP4)/3 (FLNC + GSN + PPAP2B + TPM2)/4 TPX2 (FAM13C + KLK2)/2 (Stress:0.72 * ECM − 0.26 * Migration − GSTM1 + GSTM2) 0.45 * PSA + 0.15 *TPX2 + AZGP1, SLC22A3, 0.02 * Stress − 0.16 * AZGP1 − SRD5A2 0.06 *SLC22A3 − 0.30 * SRD5A2 RS21 (BGN + COL1A1 + SFRP4)/3 (FLNC + GSN +PPAP2B + TPM2)/4 TPX2 (FAM13C + KLK2)/2 AZGP1, SLC22A3, 0.68 * ECM −0.19 * Migration − SRD5A2 0.43 * PSA + 0.16 * TPX2 − 0.18 * AZGP1 −0.07 * SLC22A3 − 0.31 * SRD5A2 RS22 (BGN + COL1A1 + SFRP4)/3 TPX2(FAM13C + KLK2)/2 (Stress: 0.62 * ECM − 0.46 * PSA + GSTM1 + GSTM2)0.18 * TPX2 − 0.07 * Stress − AZGP1, SLC22A3, 0.18 * AZGP1 − 0.08 *SLC22A3 − SRD5A2 0.34 * SRD5A2 RS23 (BGN + COL1A1 + SFRP4)/3 (FLNC +GSN + GSTM2 + TPM2)/4 TPX2 (FAM13C + KLK2)/2 AR, AZGP1, ERG, 0.73 * ECM− 0.26 * Migration − SRD5A2 0.45 * PSA + 0.17 * TPX2 + 0.02 * AR −0.17 * AZGP1 + 0.03 * ERG − 0.29 * SRD5A2 RS24 (BGN + COL1A1 + SFRP4)/3(FLNC + GSN + GSTM1 + GSTM2 + TPX2 (FAM13C + KLK2)/2 AZGP1, SLC22A3,0.52 * ECM − 0.23 * Migration − PPAP2B + TPM2)/6 SRD5A2 0.30 * PSA +0.14 * TPX2 − 0.17 * AZGP1 − 0.07 * SLC22A3 − 0.27 * SRD5A2 RS25 (BGN +COL1A1 + SFRP4)/3 (FLNC + GSN + TPM2)/3 TPX2 (FAM13C + KLK2)/2 AZGP1,GSTM2, 0.72 * ECM − 0.14 * Migration − SRD5A2 0.45 * PSA + 0.16 * TPX2 −0.17 * AZGP11 − 0.14 * GSTM2 − 0.28 * SRD5A2 RS26 (1.581 * BGN + 1.371 *COL1A1 + (0.489 * FLNC + 1.512 * GSN + 1.264 * TPX2 (1.267 * FAM13C +AZGP1, GSTM2, 0.735 * ECM − 0.368 * Migration − 0.469 * SFRP4)/3 TPM2)/3(thresholded) 2.158 * KLK2)/2 SRD5A2 0.352 * PSA + 0.094 * TPX2 −(thresholded) 0.226 * AZGP11 − 0.145 * GSTM2 − 0.351 * SRD5A2 RS27(1.581 * BGN + 1.371 * COL1A1 + [(0.489 * FLNC + 1.512 * GSN + 1.264 *TPX2 [(1.267 * FAM13C + 0.735 * ECM − 0.368 * Migration − 0.469 *SFRP4)/3 = TPM2)/3] + (0.145 * GSTM2/0.368) = (thresholded) 2.158 *KLK2)/2] + 0.352 * PSA + 0.095 * TPX2 0.527 * BGN + 0.163 * FLNC +0.504 * GSN + (0.226 * AZGP1/0.352) + 0.457 * COL1A1 + 0.421 * TPM2 +0.394 * GSTM2 (0.351 * SRD5A2Thresh/ 0.156 * SFRP4 0.352) = 0.634 *FAM13C + 1.079 * KLK2 + 0.642 * AZGP1 + 0.997SRD5A2Thresh

Table 5A shows the standardized odds ratio of each of the RS modelsusing the data from the original Gene ID study described in Example 1for time to cR and for upgrading and upstaging and the combination ofsignificant upgrading and upstaging. Table 5B shows the performance ofeach of the RS models using the data from the CC Companion (Cohorts 2and 3) study for upgrading and upstaging and the combination ofsignificant upgrading and upstaging. In this context, “upgrading” refersto an increase in Gleason grade from 3+3 or 3+4 at biopsy to greaterthan or equal to 3+4 at radical prostatectomy. “Significant upgrading”in this context refers to upgrading from Gleason grade 3+3 or 3+4 atbiopsy to equal to or greater than 4+3 at radical prostatectomy.

In addition, the gene groups used in the RS25 model were evaluated aloneand in various combinations. Table 6A shows the results of this analysisusing the data from the Gene Identification study and Table 6B shows theresults of this analysis using the data from Cohorts 2 and 3 of the CCCompanion Study.

The gene expression for some genes may be thresholded, for exampleSRD5A2 Thresh=5.5 if SRD5A2<5.5 or SRD5A2 if SRD5A2≧5.5 and TPX2Thresh=5.0 if TPX2<5.0 or TPX2 if TPX2≧5.0, wherein the gene symbolsrepresent normalized gene expression values.

The unsealed RS scores derived from Table 4 can also be resealed to bebetween 0 and 100. For example, RS27 can be resealed to be between 0 and100 as follows:

RS (scaled)=0 if 13.4×(RSu+10.5)<0;13.4×(RSu+10.5) if0≦13.4×(RSu+10.5)≦100; or 100 if 13.4×(RSu+10.5)>100.

Using the scaled RS, patients can be classified into low, intermediate,and high RS groups using pre-specified cut-points defined below in TableB. These cut-points define the boundaries between low and intermediateRS groups and between intermediate and high RS groups. The cutpointswere derived from the discovery study with the intent of identifyingsubstantial proportions of patients who on average had clinicallymeaningful low or high risk of aggressive disease. The scaled RS isrounded to the nearest integer before the cut-points defining RS groupsare applied.

TABLE B RS Group Risk Score Low Less than 16 Intermediate Greater thanor equal to 16 and less than 30 High Greater than or equal to 30

TABLE 5A Significant Significant Upgrading Upgrading Upgrading Upstagingor Upstaging RS N OR 95% CI OR 95% CI OR 95% CI OR 95% CI RS0 280 1.72(1.22, 2.41) 7.51 (4.37, 12.9) 2.01 (1.41, 2.88) 2.91 (1.95, 4.34) RS1287 1.73 (1.21, 2.48) 5.98 (3.30, 10.8) 1.99 (1.40, 2.82) 2.68 (1.80,3.97) RS2 287 1.72 (1.19, 2.48) 5.89 (3.18, 10.9) 2.02 (1.42, 2.86) 2.67(1.80, 3.95) RS3 287 1.71 (1.20, 2.45) 6.30 (3.66, 10.8) 1.96 (1.38,2.80) 2.69 (1.84, 3.93) RS4 287 1.69 (1.18, 2.42) 6.06 (3.48, 10.5) 1.99(1.40, 2.82) 2.65 (1.82, 3.86) RS5 288 1.78 (1.21, 2.62) 5.60 (3.56,8.81) 2.24 (1.59, 3.15) 2.87 (1.93, 4.28) RS6 287 1.94 (1.37, 2.74)10.16 (5.82, 17.8) 2.07 (1.48, 2.91) 3.11 (2.07, 4.67) RS7 288 1.91(1.34, 2.71) 9.34 (5.25, 16.6) 2.06 (1.47, 2.89) 3.01 (2.02, 4.48) RS8289 1.80 (1.27, 2.55) 7.49 (4.02, 14.0) 2.09 (1.49, 2.92) 2.86 (1.97,4.14) RS9 288 2.00 (1.39, 2.89) 9.56 (5.06, 18.0) 1.99 (1.42, 2.79) 3.09(2.08, 4.60) RS10 287 1.94 (1.37, 2.75) 10.12 (5.79, 17.7) 2.09 (1.49,2.94) 3.14 (2.08, 4.74) RS11 288 2.09 (1.43, 3.05) 9.46 (5.18, 17.3)2.17 (1.54, 3.05) 3.42 (2.24, 5.23) RS12 287 2.10 (1.45, 3.04) 10.41(5.92, 18.3) 2.17 (1.55, 3.06) 3.52 (2.30, 5.40) RS13 287 2.10 (1.44,3.05) 9.40 (5.50, 16.1) 2.20 (1.55, 3.13) 3.50 (2.25, 5.43) RS14 2872.06 (1.42, 2.99) 9.71 (5.65, 16.7) 2.18 (1.55, 3.08) 3.53 (2.29, 5.44)RS15 288 1.92 (1.32, 2.78) 7.93 (4.56, 13.8) 2.12 (1.51, 2.99) 3.25(2.20, 4.80) RS16 288 1.76 (1.23, 2.52) 7.10 (4.12, 12.2) 1.99 (1.41,2.82) 2.94 (1.98, 4.38) RS17 286 2.23 (1.52, 3.27) 7.52 (4.18, 13.5)2.91 (1.93, 4.38) 4.48 (2.72, 7.38) RS18 286 2.12 (1.46, 3.08) 7.04(3.87, 12.8) 2.89 (1.91, 4.37) 4.30 (2.62, 7.06) RS19 287 2.14 (1.46,3.13) 6.90 (3.80, 12.5) 2.88 (1.96, 4.23) 4.20 (2.66, 6.63) RS20 2862.30 (1.55, 3.42) 8.41 (4.65, 15.2) 2.90 (1.98, 4.25) 4.78 (3.00, 7.61)RS21 287 2.36 (1.59, 3.52) 8.83 (4.87, 16.0) 2.63 (1.76, 3.94) 4.93(3.06, 7.93) RS22 286 2.16 (1.48, 3.15) 7.57 (4.14, 13.8) 2.90 (1.96,4.27) 4.39 (2.75, 7.01) RS23 287 2.26 (1.53, 3.35) 7.46 (4.24, 13.1)2.80 (1.85, 4.24) 4.79 (2.98, 7.68) RS24 286 2.21 (1.50, 3.24) 8.01(4.38, 14.7) 2.93 (1.99, 4.31) 4.62 (2.89, 7.39) RS25 287 2.25 (1.53,3.31) 7.70 (4.25, 14.0) 2.76 (1.83, 4.16) 4.76 (2.99, 7.58) RS26 2872.23 (1.51, 3.29) 6.67 (3.52, 12.7) 2.64 (1.81, 3.86) 4.01 (2.56, 6.28)RS27 287 2.23 (1.51, 3.29) 6.67 (3.52, 12.7) 2.64 (1.81, 3.86) 4.01(2.56, 6.28)

TABLE 5B Significant Significant Upgrading Upgrading Upgrading Upstagingor Upstaging Model N Std OR 95% CI Std OR 95% CI Std OR 95% CI Std OR95% CI RS 0 166 1.16 (0.84, 1.58) 2.45 (1.61, 3.73) 2.42 (1.61, 3.62) 3(1.98, 4.56) RS 1 167 1.05 (0.77, 1.43) 2.46 (1.63, 3.71) 2.38 (1.61,3.53) 3.36 (2.18, 5.18) RS 2 167 1.04 (0.76, 1.42) 2.45 (1.63, 3.69)2.34 (1.58, 3.46) 3.25 (2.12, 4.99) RS 3 167 1.04 (0.76, 1.41) 2.56(1.69, 3.89) 2.28 (1.55, 3.36) 3.27 (2.13, 5.03) RS 4 167 1.03 (0.75,1.40) 2.54 (1.68, 3.86) 2.23 (1.52, 3.27) 3.16 (2.07, 4.82) RS 5 1671.02 (0.75, 1.39) 1.89 (1.28, 2.78) 1.77 (1.23, 2.55) 2.21 (1.52, 3.20)RS 6 167 1.08 (0.79, 1.48) 2.49 (1.64, 3.79) 2.42 (1.62, 3.62) 3.22(2.09, 4.96) RS 7 167 1.03 (0.75, 1.40) 2.31 (1.54, 3.48) 2.28 (1.54,3.38) 2.97 (1.96, 4.51) RS 8 167 0.94 (0.69, 1.28) 2.34 (1.55, 3.53)2.31 (1.56, 3.43) 2.87 (1.91, 4.30) RS 9 167 1.02 (0.75, 1.39) 2.19(1.47, 3.27) 2.22 (1.51, 3.27) 2.77 (1.85, 4.14) RS10 167 1.08 (0.79,1.48) 2.49 (1.63, 3.78) 2.41 (1.61, 3.61) 3.22 (2.09, 4.95) RS11 1670.99 (0.73, 1.35) 2.18 (1.46, 3.24) 2.17 (1.48, 3.19) 2.83 (1.88, 4.25)RS12 167 1.06 (0.78, 1.45) 2.36 (1.57, 3.56) 2.34 (1.57, 3.48) 3.12(2.04, 4.78) RS13 166 1.01 (0.74, 1.37) 2.17 (1.45, 3.23) 2.41 (1.61,3.60) 2.99 (1.97, 4.54) RS14 166 1.03 (0.76, 1.41) 2.22 (1.48, 3.31)2.33 (1.57, 3.46) 2.95 (1.94, 4.47) RS15 166 1 (0.73, 1.36) 1.98 (1.34,2.92) 2.12 (1.44, 3.12) 2.58 (1.74, 3.84) RS16 166 0.94 (0.69, 1.28) 1.7(1.16, 2.48) 2.07 (1.41, 3.03) 2.24 (1.54, 3.25) RS17 165 0.98 (0.72,1.34) 1.96 (1.33, 2.89) 2.63 (1.73, 3.98) 3.02 (1.99, 4.60) RS18 1650.97 (0.71, 1.33) 1.86 (1.26, 2.73) 2.71 (1.78, 4.13) 3.01 (1.98, 4.56)RS19 165 0.93 (0.68, 1.27) 1.86 (1.27, 2.72) 2.4 (1.61, 3.58) 2.75(1.84, 4.10) RS20 166 1.07 (0.78, 1.46) 2.2 (1.48, 3.29) 2.47 (1.65,3.69) 3.1 (2.04, 4.72) RS21 166 1.06 (0.77, 1.45) 2.2 (1.47, 3.28) 2.48(1.65, 3.71) 3.11 (2.04, 4.74) RS22 166 1.04 (0.76, 1.43) 2.21 (1.48,3.29) 2.47 (1.65, 3.70) 3.14 (2.05, 4.79) RS23 165 1.02 (0.75, 1.40)2.01 (1.36, 2.97) 2.52 (1.67, 3.79) 2.94 (1.95, 4.44) RS24 166 1.04(0.76, 1.42) 2.18 (1.46, 3.26) 2.52 (1.68, 3.78) 3.14 (2.06, 4.80) RS25166 1.04 (0.76, 1.42) 2.11 (1.42, 3.13) 2.45 (1.64, 3.67) 3 (1.98, 4.54)RS26 166 0.99 (0.72, 1.35) 2.05 (1.38, 3.04) 2.43 (1.63, 3.65) 2.82(1.88, 4.21) RS27 166 0.99 (0.72, 1.35) 2.05 (1.38, 3.04) 2.43 (1.63,3.65) 2.82 (1.88, 4.21)

TABLE 6A Significant Significant Upgrading or Upgrading UpstagingUpstaging Time to cR Upgrading Std Std Std Model N Std HR N Std OR 95%CI OR 95% CI OR 95% CI OR 95% CI RS25 428 2.82 232 2.09 (1.41, 3.10)7.35 (3.87, 14.0) 2.55 (1.63, 4.00) 4.46 (2.72, 7.32) Stromal 430 2.05234 1.32 (0.95, 1.84) 3.08 (1.84, 5.14) 1.6 (1.12, 2.30) 1.95 (1.35,2.82) Cellular Organization 430 1.67 234 1.67 (1.16, 2.39) 2.83 (1.63,4.90) 1.38 (0.96, 1.99) 2.06 (1.37, 3.10) PSA 430 1.89 234 0.96 (0.70,1.32) 1.38 (0.72, 2.63) 1.47 (1.06, 2.03) 1.25 (0.83, 1.88) ECM CellularOrganization 430 2.6 234 2 (1.37, 2.93) 11.5 (5.84, 22.7) 1.98 (1.34,2.93) 4.01 (2.44, 6.58) ECM PSA 430 2.45 234 1.17 (0.85, 1.61) 2.46(1.44, 4.21) 1.7 (1.21, 2.39) 1.76 (1.22, 2.53) Cellular OrganizationPSA 430 2.04 234 1.3 (0.92, 1.82) 2.52 (1.23, 5.16) 1.63 (1.13, 2.36)1.85 (1.19, 2.87) ECM Cellular Organization TPX2 429 2.61 233 1.89(1.31, 2.72) 11.3 (5.46, 23.5) 1.94 (1.31, 2.87) 3.99 (2.44, 6.54) ECMPSA TPX2 429 2.42 233 1.24 (0.90, 1.71) 3.25 (1.91, 5.51) 1.75 (1.22,2.49) 2.08 (1.45, 2.98) Cellular Organization PSA TPX2 429 2.04 233 1.33(0.95, 1.86) 3.2 (1.74, 5.90) 1.69 (1.17, 2.44) 2.21 (1.45, 3.37) ECMCellular Organization GSTM2 430 2.67 234 2.03 (1.39, 2.96) 11.3 (5.72,22.3) 2.17 (1.43, 3.30) 4.35 (2.50, 7.58) ECM PSA GSTM2 430 2.86 2341.48 (1.05, 2.09) 4.45 (2.03, 9.76) 2.2 (1.45, 3.34) 2.66 (1.64, 4.31)Cellular Organization PSA GSTM2 430 2.25 234 1.34 (0.94, 1.90) 2.52(1.18, 5.38) 1.92 (1.29, 2.84) 2.02 (1.20, 3.39) ECM CellularOrganization GSTM2 428 2.72 232 2.38 (1.58, 3.57) 11.5 (6.02, 21.8) 2.48(1.58, 3.87) 5.22 (2.97, 9.17) TPX2 AZGP1 SRD5A2 ECM PSA GSTM2 TPX2AZGP1 428 2.8 232 2.03 (1.38, 3.00) 6.65 (3.52, 12.6) 2.6 (1.65, 4.09)4.26 (2.58, 7.02) SRD5A2 Cellular Organization PSA GSTM2 428 2.38 2321.92 (1.28, 2.88) 3.63 (2.14, 6.15) 2.6 (1.64, 4.12) 3.49 (2.08, 5.83)TPX2 AZGP1 SRD5A2

TABLE 6B Significant Upgrading or Significant Upstaging UpstagingUpgrading Upgrading Std Std Model N Std OR 95% CI Std OR 95% CI OR 95%CI OR 95% CI RS25 166 1.04 (0.76, 1.42) 2.11 (1.42, 3.13) 2.45 (1.64,3.67) 3 (1.98, 4.54) Stromal 166 0.99 (0.73, 1.35) 2.19 (1.45, 3.32)1.65 (1.15, 2.38) 1.86 (1.31, 2.65) Cellular Organization 167 1.06(0.77, 1.44) 0.93 (0.64, 1.36) 1.49 (1.04, 2.13) 1.44 (1.03, 2.00) PSA167 1.04 (0.76, 1.42) 1.68 (1.16, 2.44) 1.78 (1.24, 2.57) 1.96 (1.37,2.81) ECM Cellular Organization 166 1.04 (0.76, 1.42) 1.96 (1.32, 2.91)2.32 (1.55, 3.45) 2.6 (1.76, 3.85) ECM PSA 166 1.02 (0.75, 1.39) 2.14(1.44, 3.20) 1.84 (1.28, 2.67) 2.11 (1.47, 3.04) Cellular OrganizationPSA 167 1.07 (0.78, 1.46) 1.36 (0.94, 1.97) 2.06 (1.40, 3.04) 2.12(1.47, 3.06) ECM Cellular Organization TPX2 166 1.15 (0.84, 1.58) 2.24(1.49, 3.37) 2.55 (1.69, 3.85) 2.95 (1.96, 4.45) ECM PSA TPX2 166 1.2(0.88, 1.65) 2.66 (1.71, 4.13) 2.28 (1.53, 3.40) 2.72 (1.82, 4.07)Cellular Organization PSA TPX2 167 1.3 (0.95, 1.79) 1.77 (1.21, 2.60)2.42 (1.62, 3.63) 2.65 (1.79, 3.92) ECM Cellular Organization GSTM2 1660.96 (0.70, 1.30) 1.76 (1.20, 2.57) 2.12 (1.44, 3.12) 2.34 (1.60, 3.42)ECM PSA GSTM2 166 0.91 (0.67, 1.24) 1.69 (1.16, 2.46) 1.85 (1.28, 2.67)2.05 (1.42, 2.94) Cellular Organization PSA GSTM2 167 0.89 (0.65, 1.22)1.13 (0.78, 1.62) 1.72 (1.19, 2.48) 1.75 (1.23, 2.48) ECM CellularOrganization GSTM2 TPX2 AZGP1 166 1.04 (0.76, 1.42) 2.14 (1.44, 3.20)2.47 (1.65, 3.70) 2.94 (1.95, 4.44) SRD5A2 ECM PSA GSTM2 TPX2 AZGP1SRD5A2 166 1.03 (0.75, 1.41) 2.11 (1.42, 3.13) 2.39 (1.61, 3.57) 2.94(1.95, 4.45) Cellular Organization PSA GSTM2 TPX2 AZGP1 167 1.07 (0.78,1.46) 1.84 (1.26, 2.68) 2.22 (1.51, 3.27) 2.73 (1.83, 4.09) SRD5A2

Example 3 Clique Stack Analysis to Identify Co-Expressed Genes

The purpose of the gene clique stacks method described in this Examplewas to find a set of co-expressed (or surrogate) biomarkers that can beused to reliably predict outcome as well or better than the genesdisclosed above. The method used to identify the co-expressed markers isillustrated in FIG. 4. The set of co-expressed biomarkers were obtainedby seeding the maximal clique enumeration (MCE) with curated biomarkersextracted from the scientific literature. The maximal clique enumeration(MCE) method [Bron et al, 1973] aggregates genes into tightlyco-expressed groups such that all of the genes in the group have asimilar expression profile. When all of the genes in a group satisfy aminimal similarity condition, the group is called a clique. When aclique is as large as possible without admitting any ‘dissimilar’ genesinto the clique, then the clique is said to be maximal. Using the MCEmethod, all maximal cliques are searched within a dataset. Using thismethod, almost any degree of overlap between the maximal cliques can befound, as long as the overlap is supported by the data. Maximal cliqueenumeration has been shown [Borate et al, 2009] to be an effective wayof identifying co-expressed gene modules (CGMs).

1. Definitions

The following table defines a few terms commonly used in the gene cliquestack analyses.

TABLE 7 Term Definition Node The abundance of a gene (for the purposesof CGM analysis) Edge A line connecting two nodes, indicatingco-expression of the two nodes Graph A collection of nodes and edgesClique A graph with an edge connecting all pair-wise combinations ofnodes in the graph maximal clique A clique that is not contained in anyother clique Stack A graph obtained by merging at least two cliques orstacks such that the overlap between the two cliques or stacks exceedssome user- defined threshold. gene expression profile A two-dimensionalmatrix, with genes listed down the rows and samples listed across thecolumns. Each (i, j) entry in the matrix corresponds to relative mRNAabundance for gene i and sample j.

2. Examples of Cliques and Stacks

FIG. 5 shows a family of three different graphs. A graph consists ofnodes (numbered) and connecting edges (lines). FIG. 5( a) is not aclique because there is no edge connecting nodes 3 and 4. FIG. 5( b) isa clique because there is an edge connecting all pair-wise combinationsof nodes in the graph. FIG. 5( c) is a clique, but not a maximal cliquebecause it is contained in clique (b). Given a graph with connectingedges, the MCE algorithm will systematically list all of maximal cliqueswith 3 or more nodes. For example, the graph in FIG. 6 has two maximalcliques: 1-2-3-4-5 and 1-2-3-4-6.

When based on gene expression data, there are typically large numbers ofmaximal cliques that are very similar to one another. These maximalcliques can be merged into stacks of maximal cliques. The stacks are thefinal gene modules of interest and generally are far fewer in numberthan are the maximal cliques. FIG. 7 schematically illustrates stackingof two maximal cliques.

3. Seeding

For the purposes of finding surrogate co-expressed markers, biomarkersfrom the literature can be identified and then used to seed the MCE andstacking algorithms. The basic idea is as follows: for each seed,compute a set of maximal cliques (using the parallel MCE algorithm).Then stack the maximal cliques obtained for each seed, yielding a set ofseeded stacks. Finally, stack the seeded stacks to obtain a “stack ofseeded stacks.” The stack of seeded stacks is an approximation to thestacks that would be obtained by using the conventional (i.e. unseeded)MCE/stacking algorithms. The method used to identify genes thatco-express with the genes disclosed above illustrated in FIG. 4 and isdescribed in more detail below.

3.1 Seeded MCE Algorithm (Steps 1-4)

1. The process begins by identifying an appropriate set, S_(s), ofseeding genes. In the instant case, the seeding genes were selected fromthe gene subsets disclosed above.

2. With the seeding genes specified, select a measure of correlation,R(g₁,g₂), between the gene expression profiles of any two genes, g₁,g₂,along with a correlation threshold below which g₁,g₂, can be considereduncorrelated. For each seeding gene s in the seeding set S_(s), find allgene pairs (s,g) in the dataset such that R(s,g) is greater than orequal to the correlation threshold. Let G_(s) be the union of s and theset of all genes correlated with s. For the instant study, the Spearmancoefficient was used as the measure of correlation and 0.7 as thecorrelation threshold.

3. Compute the correlation coefficient for each pair-wise combination ofgenes (g_(i),g_(j)) in G_(s). Let X_(s) be the set of all gene pairs forwhich R(g_(i),g_(j)) is greater than or equal to the correlationthreshold. If the genes were plotted as in FIG. 5, there would be anedge (line) between each pair of genes in X_(g).

4. Run the MCE algorithm, as described in Schmidt et al (J. ParallelDistrib. Comput. 69 (2009) 417-428) on the gene pairs X_(s) for eachseeding gene.

3.2 Seeded Stacking Algorithm (Steps 5-6)

The purpose of stacking is to reduce the number of cliques down to amanageable number of gene modules (stacks). Continuing with steps 5 and6 of FIG. 4:

5. For each seeding gene, sort cliques from largest to smallest, i.e.most number of nodes to smallest number of nodes. From the remainingcliques, find the clique with the greatest overlap. If the overlapexceeds a user-specified threshold T, merge the two cliques together toform the first stack. Resort the cliques and stack(s) from largest tosmallest and repeat the overlap test and merging. Repeat the processuntil no new merges occur.

6. One now has a set of stacks for each seeding gene. In the final step,all of the seeded stacks are combined into one set of stacks, σ. As thefinal computation, all of the stacks in σ are stacked, just as in step5. This stack of stacks is the set of gene modules used for the instantstudy.

Genes that were shown to co-express with genes identified by this methodare shown in Tables 8-11. “Stack ID” in the Tables is simply an index toenumerate the stacks and “probeWt” refers to the probe weight, or thenumber of times a probe (gene) appears in the stack.

TABLE 8 Coexpressed Coexpressed Coexpressed StackID Gene ProbeWtSeedingGene StackID Gene ProbeWt SeedingGene StackID Gene ProbeWtSeedingGene 1 SLCO2B1 1 BGN 1 SPARC 1 SPARC 1 SPARC 1 COL4A1 1 LHFP 1BGN 1 COL4A1 1 SPARC 1 COL4A1 1 COL4A1 1 ENG 1 BGN 1 COL4A2 1 SPARC 1HTRA1 1 COL4A1 2 LHFP 1 BGN 2 COL3A1 1 SPARC 2 COL4A1 3 COL4A1 2 THY1 1BGN 2 SPARC 1 SPARC 2 NID1 3 COL4A1 2 ENG 1 BGN 2 COL4A1 1 SPARC 2 CD932 COL4A1 3 COL1A1 1 BGN 2 VCAN 1 SPARC 2 FBN1 2 COL4A1 3 THY1 1 BGN 2FN1 1 SPARC 2 COL1A1 1 COL4A1 3 ENG 1 BGN 3 HEG1 3 SPARC 2 MCAM 1 COL4A14 COL1A1 1 BGN 3 MEF2C 3 SPARC 2 SPARC 1 COL4A1 4 PDGFRB 1 BGN 3 RGS5 2SPARC 3 COL1A2 4 COL4A1 4 FMNL3 1 BGN 3 KDR 2 SPARC 3 COL4A1 4 COL4A1 5SLCO2B1 1 BGN 3 LAMA4 1 SPARC 3 VCAN 2 COL4A1 5 LHFP 1 BGN 3 SPARC 1SPARC 3 FN1 2 COL4A1 5 COL3A1 1 BGN 4 COL3A1 5 SPARC 3 COL1A1 2 COL4A1 6THY1 1 BGN 4 SPARC 5 SPARC 3 NID1 2 COL4A1 6 LHFP 1 BGN 4 COL1A1 3 SPARC3 HTRA1 1 COL4A1 6 COL3A1 1 BGN 4 COL1A2 2 SPARC 3 COL6A3 1 COL4A1 7THY1 1 BGN 4 BGN 2 SPARC 7 COL1A1 1 BGN 4 PDGFRB 2 SPARC 1 INHBA 1 INHBA7 COL3A1 1 BGN 4 COL4A1 1 SPARC 1 STMN2 1 INHBA 8 BGN 7 BGN 4 IGFBP7 1SPARC 1 COL10A1 1 INHBA 8 COL1A1 4 BGN 4 FBN1 1 SPARC 8 COL3A1 4 BGN 5SPARC 4 SPARC 1 THBS2 1 THBS2 8 FMNL3 4 BGN 5 PDGFRB 4 SPARC 1 COL3A1 1THBS2 8 SLCO2B1 3 BGN 5 DPYSL2 3 SPARC 1 VCAN 1 THBS2 8 SPARC 3 BGN 5FBN1 3 SPARC 8 ENG 3 BGN 5 HEG1 2 SPARC 8 PDGFRB 3 BGN 5 CDH11 2 SPARC 8THBS2 1 BGN 5 FBLN5 2 SPARC 1 THBS2 1 COL3A1 5 LAMA2 2 SPARC 1 COL3A1 1COL3A1 5 IGFBP7 2 SPARC 1 VCAN 1 COL3A1 5 LAMA4 2 SPARC 2 COL3A1 3COL3A1 5 RGS5 1 SPARC 2 SPARC 3 COL3A1 5 COL4A2 1 SPARC 2 FN1 2 COL3A1 5COL1A2 1 SPARC 2 COL4A1 2 COL3A1 6 FBN1 7 SPARC 2 VCAN 1 COL3A1 6 LAMA46 SPARC 2 COL1A1 1 COL3A1 6 SGK269 5 SPARC 2 FBN1 1 COL3A1 6 CDH11 5SPARC 3 COL1A2 3 COL3A1 6 DPYSL2 5 SPARC 3 PDGFRB 3 COL3A1 6 LAMA2 5SPARC 3 IGFBP7 3 COL3A1 6 SPARC 4 SPARC 3 FBN1 3 COL3A1 6 SULF1 4 SPARC3 CDH11 2 COL3A1 6 FBLN5 3 SPARC 3 AEBP1 2 COL3A1 6 LTBP1 3 SPARC 3COL3A1 1 COL3A1 6 EPB41L2 3 SPARC 3 SPARC 1 COL3A1 6 MEF2C 3 SPARC 4COL3A1 5 COL3A1 6 FN1 2 SPARC 4 BGN 4 COL3A1 6 EDIL3 2 SPARC 4 COL1A1 3COL3A1 6 COL3A1 1 SPARC 4 SPARC 3 COL3A1 6 IGFBP7 1 SPARC 4 FMNL3 2COL3A1 6 HEG1 1 SPARC 4 PDGFRB 2 COL3A1 4 COL1A2 1 COL3A1 4 THY1 1COL3A1 4 THBS2 1 COL3A1

TABLE 9 Coexpressed Probe- Seeding Coexpressed Probe- SeedingCoexpressed Seeding StackID Gene Wt Gene StackID Gene Wt Gene StackIDGene ProbeWt Gene 1 DDR2 26870 C7 1 MYH11 168 GSTM2 1 PPAP2B 15794SRD5A2 1 SPARCL1 25953 C7 1 TGFBR3 163 GSTM2 1 VWA5A 12616 SRD5A2 1 FAT424985 C7 1 RBMS3 162 GSTM2 1 SPON1 12395 SRD5A2 1 SYNE1 24825 C7 1 FHL1161 GSTM2 1 FAT4 12218 SRD5A2 1 SLC8A1 24327 C7 1 MYLK 158 GSTM2 1 SSPN12126 SRD5A2 1 MEIS1 23197 C7 1 CACHD1 155 GSTM2 1 MKX 11552 SRD5A2 1PRRX1 22847 C7 1 TIMP3 154 GSTM2 1 PRRX1 11061 SRD5A2 1 CACHD1 22236 C71 SYNM 152 GSTM2 1 LOC645954 10811 SRD5A2 1 DPYSL3 20623 C7 1 NEXN 147GSTM2 1 SYNM 10654 SRD5A2 1 LTBP1 20345 C7 1 MYL9 142 GSTM2 1 ANXA610330 SRD5A2 1 SGK269 19461 C7 1 CRYAB 141 GSTM2 1 PDE5A 10011 SRD5A2 1EDNRA 19280 C7 1 VWA5A 131 GSTM2 1 TSHZ3 9588 SRD5A2 1 TRPC4 18689 C7 1AOX1 130 GSTM2 1 GSN 9505 SRD5A2 1 TIMP3 18674 C7 1 FLNC 127 GSTM2 1NID2 9503 SRD5A2 1 TGFBR3 18367 C7 1 PPAP2B 125 GSTM2 1 CLU 9304 SRD5A21 ZEB1 18355 C7 1 GSTM2 118 GSTM2 1 TPM2 8659 SRD5A2 1 C1S 16871 C7 1C21orf63 101 GSTM2 1 FBLN1 8068 SRD5A2 1 ABCC9 16562 C7 1 POPDC2 72GSTM2 1 PARVA 7949 SRD5A2 1 PCDH18 14936 C7 1 TPM2 66 GSTM2 1 SPOCK37772 SRD5A2 1 C7 14789 C7 1 CDC42EP3 60 GSTM2 1 PCDH18 7514 SRD5A2 1PDGFC 14748 C7 1 CCDC69 58 GSTM2 1 ILK 7078 SRD5A2 1 PTPLAD2 13590 C7 1CRISPLD2 52 GSTM2 1 ITIH5 6903 SRD5A2 1 VCL 13332 C7 1 GBP2 47 GSTM2 1ADCY5 6374 SRD5A2 1 MMP2 13107 C7 1 ADCY5 44 GSTM2 1 CRYAB 6219 SRD5A2 1FERMT2 12681 C7 1 MATN2 40 GSTM2 1 RBMS3 6108 SRD5A2 1 EPB41L2 12335 C71 AOC3 38 GSTM2 1 AOX1 4943 SRD5A2 1 PRNP 12133 C7 1 ACACB 36 GSTM2 1WWTR1 4789 SRD5A2 1 FBN1 11965 C7 1 RND3 28 GSTM2 1 AOC3 4121 SRD5A2 1GLT8D2 11954 C7 1 CLIP4 26 GSTM2 1 CAP2 4091 SRD5A2 1 DSE 11888 C7 1APOBEC3C 20 GSTM2 1 MAP1B 3917 SRD5A2 1 SCN7A 11384 C7 1 CAV2 18 GSTM2 1OGN 3893 SRD5A2 1 PPAP2B 11121 C7 1 TRIP10 17 GSTM2 1 PLN 3581 SRD5A2 1PGR 10566 C7 1 TCF21 11 GSTM2 1 CFL2 2857 SRD5A2 1 PALLD 10240 C7 1CAMK2G 11 GSTM2 1 MATN2 2808 SRD5A2 1 CNTN1 10113 C7 1 GSTM5P1 9 GSTM2 1ADRA1A 2694 SRD5A2 1 SERPING1 9800 C7 1 ACSS3 9 GSTM2 1 BOC 2401 SRD5A21 DKK3 9279 C7 1 GSTM4 7 GSTM2 1 ANGPT1 2290 SRD5A2 1 CCND2 9131 C7 1GSTP1 5 GSTM2 1 POPDC2 2205 SRD5A2 1 MSRB3 8502 C7 1 GSTM1 3 GSTM2 1FGF2 2162 SRD5A2 1 LAMA4 8477 C7 1 GSTM3 2 GSTM2 1 TCF21 1996 SRD5A2 1RBMS3 8425 C7 1 GSTM2P1 2 GSTM2 1 LOC283904 1983 SRD5A2 1 FBLN1 7968 C71 TGFB3 1 GSTM2 1 DNAJB5 1773 SRD5A2 1 EPHA3 6930 C7 1 FTO 1 IGF1 1TSPAN2 1731 SRD5A2 1 ACTA2 6824 C7 1 UTP11L 1 IGF1 1 GSTM5 1635 SRD5A2 1ADAM22 6791 C7 1 SGCB 1 IGF1 1 RGN 1594 SRD5A2 1 WWTR1 6611 C7 2 CHP 14IGF1 1 PDLIM7 1503 SRD5A2 1 HEPH 6406 C7 2 RP2 14 IGF1 1 MITF 1481SRD5A2 1 TIMP2 6219 C7 2 SPRYD4 14 IGF1 1 BNC2 1300 SRD5A2 1 CLIC4 6151C7 2 SGCB 13 IGF1 1 SCN7A 1274 SRD5A2 1 ATP2B4 5897 C7 2 INMT 13 IGF1 1GPM6B 1202 SRD5A2 1 TNS1 5842 C7 2 IGF1 12 IGF1 1 ARHGAP20 1193 SRD5A2 1PDGFRA 5802 C7 2 ARPP19 9 IGF1 1 PDZRN4 1190 SRD5A2 1 ITGA1 5781 C7 2MOCS3 9 IGF1 1 PCP4 1107 SRD5A2 1 RHOJ 5103 C7 2 KATNAL1 8 IGF1 1 ANO5987 SRD5A2 1 COL14A1 5063 C7 2 C3orf33 8 IGF1 1 C6orf186 930 SRD5A2 1CALD1 4828 C7 2 SLC16A4 7 IGF1 1 ARHGAP10 793 SRD5A2 1 DCN 4825 C7 2 FTO7 IGF1 1 CLIP4 775 SRD5A2 1 IRAK3 4476 C7 2 SNX27 6 IGF1 1 CCDC69 733SRD5A2 1 MATN2 4448 C7 2 C1orf55 5 IGF1 1 SLC24A3 673 SRD5A2 1 KIT 4329C7 2 C1orf174 4 IGF1 1 ACSS3 668 SRD5A2 1 NEXN 4257 C7 2 SNTN 4 IGF1 1IL33 611 SRD5A2 1 ZEB2 3798 C7 2 MCART6 4 IGF1 1 CAMK2G 519 SRD5A2 1COL6A3 3679 C7 2 OTUD3 4 IGF1 1 PTPLA 505 SRD5A2 1 NID2 3678 C7 2ADAMTS4 4 IGF1 1 EFEMP1 493 SRD5A2 1 PRICKLE2 3671 C7 2 FEZ1 4 IGF1 1KIT 470 SRD5A2 1 OGN 3418 C7 2 SPATA5 4 IGF1 1 ODZ3 428 SRD5A2 1 SSPN3142 C7 2 ZNRF3 4 IGF1 1 MRGPRF 390 SRD5A2 1 SORBS1 3126 C7 2 C1orf229 4IGF1 1 C21orf63 383 SRD5A2 1 PDE5A 2963 C7 2 STX2 4 IGF1 1 CRISPLD2 322SRD5A2 1 LOC732446 2925 C7 2 PURB 4 IGF1 1 MYADM 314 SRD5A2 1 FCHSD22741 C7 2 BVES 4 IGF1 1 C7 278 SRD5A2 1 PMP22 2609 C7 2 DTX3L 4 IGF1 1PDGFRA 219 SRD5A2 1 TRPC1 2519 C7 2 ZNF713 4 IGF1 1 EYA1 199 SRD5A2 1ANXA6 2353 C7 2 DSCR3 4 IGF1 1 ATP1A2 174 SRD5A2 1 SPON1 2278 C7 2SLC35F1 4 IGF1 1 ACACB 173 SRD5A2 1 FBLN5 2115 C7 2 C22orf25 4 IGF1 1NT5E 168 SRD5A2 1 CHRDL1 1996 C7 2 STK4 4 IGF1 1 GPR124 166 SRD5A2 1MEF2C 1980 C7 2 EIF5A2 4 IGF1 1 LOC652799 165 SRD5A2 1 EFEMP1 1939 C7 2SUPT7L 4 IGF1 1 LRCH2 123 SRD5A2 1 JAZF1 1748 C7 2 C10orf78 4 IGF1 1PYGM 100 SRD5A2 1 DNAJB4 1636 C7 2 ANKS4B 4 IGF1 1 GSTM2 92 SRD5A2 1ARHGEF6 1594 C7 2 C1orf151 4 IGF1 1 KCNAB1 90 SRD5A2 1 MFAP4 1503 C7 2RPL32P3 4 IGF1 1 HHIP 82 SRD5A2 1 LOC652799 1470 C7 2 SEC62 4 IGF1 1ALDH1A2 70 SRD5A2 1 PREX2 1464 C7 2 DBR1 4 IGF1 1 PRDM5 63 SRD5A2 1MAN1A1 1433 C7 2 FLJ39639 4 IGF1 1 ABCA8 59 SRD5A2 1 TCF21 1224 C7 2ZNF543 4 IGF1 1 MAML2 51 SRD5A2 1 CRIM1 1181 C7 2 FRRS1 4 IGF1 1 PAK3 38SRD5A2 1 A2M 1168 C7 2 TATDN3 4 IGF1 1 SNAI2 35 SRD5A2 1 DPYSL2 1029 C72 WDR55 4 IGF1 1 UST 27 SRD5A2 1 GPM6B 993 C7 2 KIAA1737 4 IGF1 1 TMLHE21 SRD5A2 1 PLN 970 C7 2 APOBEC3F 4 IGF1 1 ACTC1 15 SRD5A2 1 IL33 942 C72 RNF7 4 IGF1 1 C5orf4 8 SRD5A2 1 CCDC80 889 C7 2 SIKE1 4 IGF1 1 GSTM5P14 SRD5A2 1 LMO3 852 C7 2 HSP90B3P 4 IGF1 1 GSTM4 3 SRD5A2 1 SEC23A 765C7 2 GNS 4 IGF1 1 PDK4 2 SRD5A2 1 MOXD1 708 C7 2 C1orf212 4 IGF1 1 TGFB32 SRD5A2 1 SPOCK3 622 C7 2 ZNF70 4 IGF1 1 GSTM1 1 SRD5A2 1 HEG1 608 C7 2TMEM127 4 IGF1 1 LOC728846 1 TGFB1I1 1 LUM 589 C7 2 ALDH1B1 4 IGF1 1CLIP3 1 TGFB1I1 1 C7orf58 566 C7 2 HP1BP3 4 IGF1 1 EMILIN1 1 TGFB1I1 1CDC42EP3 539 C7 2 APOL6 4 IGF1 2 CLIP3 1 TGFB1I1 1 CPVL 524 C7 2 MALL 4IGF1 2 MRC2 1 TGFB1I1 1 CPA3 421 C7 2 C11orf17 4 IGF1 2 MEG3 1 TGFB1I1 1SLIT2 417 C7 2 LOC729199 4 IGF1 3 MRC2 1 TGFB1I1 1 KLHL5 376 C7 2 RELL14 IGF1 3 LCAT 1 TGFB1I1 1 HLF 322 C7 2 PELI1 4 IGF1 3 MEG3 1 TGFB1I1 1PLXDC2 313 C7 2 ASB6 4 IGF1 4 LDB3 18 TGFB1I1 1 CAP2 301 C7 2 C2orf18 4IGF1 4 TGFB1I1 15 TGFB1I1 1 FXYD6 291 C7 2 PSTPIP2 4 IGF1 4 ASB2 11TGFB1I1 1 ECM2 272 C7 2 CLEC7A 4 IGF1 4 CLIP3 11 TGFB1I1 1 SRD5A2 245 C72 RAB22A 4 IGF1 4 ITGA7 10 TGFB1I1 1 MBNL1 245 C7 2 LOC643770 4 IGF1 4JPH2 10 TGFB1I1 1 LAMA2 169 C7 2 LOC100129502 4 IGF1 4 RUSC2 10 TGFB1I11 IL6ST 166 C7 2 ZCCHC4 4 IGF1 4 HRNBP3 8 TGFB1I1 1 PODN 112 C7 2 PNMA24 IGF1 4 LIMS2 8 TGFB1I1 1 ATRNL1 110 C7 2 PIGW 4 IGF1 4 CSPG4 7 TGFB1I11 DOCK11 60 C7 2 SLC25A32 4 IGF1 4 NLGN3 5 TGFB1I1 1 FGL2 56 C7 2 CLCC14 IGF1 4 ADAM33 3 TGFB1I1 1 SPRY2 12 C7 2 KIAA0513 4 IGF1 4 NHSL2 3TGFB1I1 1 OLFML1 12 C7 2 SS18 4 IGF1 4 SYDE1 2 TGFB1I1 1 NEGR1 4 C7 2CECR1 4 IGF1 4 RASL12 2 TGFB1I1 1 IGFBP5 1 C7 2 ZNF490 4 IGF1 4 LOC905862 TGFB1I1 1 SORBS1 1 DES 2 PDE12 4 IGF1 4 GNAZ 1 TGFB1I1 1 CACNA1C 1 DES2 C10orf76 4 IGF1 4 TMEM35 1 TGFB1I1 1 DES 1 DES 2 CCL22 4 IGF1 4 LCAT 1TGFB1I1 2 ITIH5 1 DES 2 RRN3P1 4 IGF1 4 LOC728846 1 TGFB1I1 2 ANXA6 1DES 2 LOC100127925 4 IGF1 4 SLC24A3 1 TGFB1I1 2 ATP1A2 1 DES 2 SC4MOL 4IGF1 5 MRGPRF 381 TGFB1I1 3 ITIH5 1 DES 2 AP4E1 4 IGF1 5 PDLIM7 362TGFB1I1 3 DES 1 DES 2 APOLD1 4 IGF1 5 AOC3 321 TGFB1I1 3 ANXA6 1 DES 2ARSB 4 IGF1 5 ADCY5 317 TGFB1I1 4 TPM1 1 DES 2 ZNF264 4 IGF1 5 KANK2 306TGFB1I1 4 DES 1 DES 2 SLC30A6 4 IGF1 5 SLC24A3 292 TGFB1I1 4 CES1 1 DES2 METTL7A 4 IGF1 5 MYL9 287 TGFB1I1 5 TAGLN 72309 DES 2 PARD6B 4 IGF1 5FLNC 275 TGFB1I1 5 FLNA 72305 DES 2 STOM 4 IGF1 5 TGFB1I1 253 TGFB1I1 5TNS1 72049 DES 2 CYP20A1 4 IGF1 5 ITGA7 222 TGFB1I1 5 CNN1 69837 DES 2LYZ 4 IGF1 5 DES 216 TGFB1I1 5 ACTA2 68389 DES 2 ATP1B4 4 IGF1 5 FLNA214 TGFB1I1 5 CHRDL1 67725 DES 2 SCD5 4 IGF1 5 EFEMP2 206 TGFB1I1 5DPYSL3 67225 DES 2 CEP170L 4 IGF1 5 TAGLN 184 TGFB1I1 5 MSRB3 66488 DES2 NUDT19 4 IGF1 5 RASL12 163 TGFB1I1 5 VCL 65707 DES 2 TXNL4B 4 IGF1 5GAS6 163 TGFB1I1 5 CCND2 65291 DES 2 APPL1 4 IGF1 5 KCNMB1 163 TGFB1I1 5SLC8A1 65217 DES 2 OSBPL2 4 IGF1 5 SMTN 157 TGFB1I1 5 MEIS1 65097 DES 2VMA21 4 IGF1 5 GPR124 140 TGFB1I1 5 ATP2B4 64428 DES 2 NF2 4 IGF1 5COL6A1 133 TGFB1I1 5 DDR2 64293 DES 2 ZNF772 4 IGF1 5 DNAJB5 127 TGFB1I15 LMOD1 64271 DES 2 LOC646973 4 IGF1 5 COL6A2 124 TGFB1I1 5 SORBS1 63359DES 2 LOC100128096 4 IGF1 5 TPM2 121 TGFB1I1 5 KCNMB1 61499 DES 2 MOAP14 IGF1 5 WFDC1 121 TGFB1I1 5 PGR 60803 DES 2 HIGD1A 4 IGF1 5 TNS1 112TGFB1I1 5 RBPMS 59947 DES 2 DISC2 4 IGF1 5 DKK3 111 TGFB1I1 5 FLNC 59840DES 2 CYCS 4 IGF1 5 HSPB8 108 TGFB1I1 5 MYLK 58329 DES 2 ZSCAN22 4 IGF15 TSPAN18 103 TGFB1I1 5 FHL1 58303 DES 2 LOC646127 4 IGF1 5 MYH11 102TGFB1I1 5 FZD7 56889 DES 2 RRP15 4 IGF1 5 GEFT 90 TGFB1I1 5 EDNRA 56620DES 2 LOC100130357 4 IGF1 5 ITIH5 81 TGFB1I1 5 DKK3 56591 DES 2 YES1 4IGF1 5 PYGM 81 TGFB1I1 5 DES 54990 DES 2 MTFMT 4 IGF1 5 MCAM 78 TGFB1I15 PGM5 54713 DES 2 JOSD1 4 IGF1 5 MRVI1 75 TGFB1I1 5 LOC729468 53979 DES2 RHOF 4 IGF1 5 MYLK 68 TGFB1I1 5 SYNE1 53386 DES 2 LIN54 4 IGF1 5 CNN163 TGFB1I1 5 PGM5P2 53378 DES 2 LOC729142 4 IGF1 5 RBPMS2 63 TGFB1I1 5SPARCL1 52082 DES 2 GNG4 4 IGF1 5 ATP1A2 58 TGFB1I1 5 ACTG2 51556 DES 2H6PD 4 IGF1 5 LIMS2 58 TGFB1I1 5 TRPC4 51205 DES 2 FBXW2 4 IGF1 5 LMOD156 TGFB1I1 5 CAV1 49615 DES 2 NUP43 4 IGF1 5 GNAO1 46 TGFB1I1 5 GNAL49292 DES 2 WDR5B 4 IGF1 5 LGALS1 43 TGFB1I1 5 TIMP3 48293 DES 2 ANGEL24 IGF1 5 DAAM2 41 TGFB1I1 5 ABCC9 46190 DES 2 SGTB 4 IGF1 5 MRC2 39TGFB1I1 5 MRVI1 44926 DES 2 MAPK1IP1L 4 IGF1 5 HRNBP3 38 TGFB1I1 5 ACTN144120 DES 2 ZSCAN29 4 IGF1 5 ASB2 36 TGFB1I1 5 PALLD 43624 DES 2 FXC1 4IGF1 5 CLIP3 25 TGFB1I1 5 SERPINF1 43602 DES 2 NQO1 4 IGF1 5 C16orf45 22TGFB1I1 5 JAZF1 42715 DES 2 MOBKL1A 4 IGF1 5 DBNDD2 20 TGFB1I1 5 KANK242364 DES 2 ANAPC16 4 IGF1 5 RUSC2 19 TGFB1I1 5 HSPB8 41435 DES 2C16orf63 4 IGF1 5 RARRES2 18 TGFB1I1 5 MYL9 37460 DES 2 TBCCD1 4 IGF1 5ADRA1A 18 TGFB1I1 5 PRNP 33800 DES 2 DLEU2 4 IGF1 5 TINAGL1 17 TGFB1I1 5TSPAN18 33287 DES 2 CARD8 4 IGF1 5 SYNM 17 TGFB1I1 5 FRMD6 32935 DES 2LOC100130236 4 IGF1 5 TMEM35 14 TGFB1I1 5 CSRP1 32471 DES 2 LOC1001304424 IGF1 5 COPZ2 12 TGFB1I1 5 HEPH 32337 DES 2 CAMLG 4 IGF1 5 LTBP4 12TGFB1I1 5 NEXN 29867 DES 2 ZBTB3 4 IGF1 5 SCARA3 11 TGFB1I1 5 PRICKLE229746 DES 2 ZNF445 4 IGF1 5 NR2F1 11 TGFB1I1 5 PPAP2B 28983 DES 2 CASP84 IGF1 5 PCDH10 11 TGFB1I1 5 MYH11 28923 DES 2 RAB21 4 IGF1 5 RAB34 10TGFB1I1 5 PDGFC 28732 DES 2 ZC3HAV1L 4 IGF1 5 FOXF1 8 TGFB1I1 5 TPM127766 DES 2 SC5DL 4 IGF1 5 TCF7L1 7 TGFB1I1 5 SVIL 27521 DES 2 KILLIN 4IGF1 5 KIRREL 6 TGFB1I1 5 LOC732446 27335 DES 2 MTX3 4 IGF1 5 DACT1 6TGFB1I1 5 MEIS2 25944 DES 2 KCNE4 4 IGF1 5 ZNF516 5 TGFB1I1 5 CALD125386 DES 2 GM2A 4 IGF1 5 EMILIN1 4 TGFB1I1 5 CNTN1 25377 DES 2LOC401588 4 IGF1 5 DCHS1 4 TGFB1I1 5 FERMT2 25146 DES 2 C8orf79 4 IGF1 5EHBP1L1 3 TGFB1I1 5 CLU 24888 DES 2 KIAA0754 4 IGF1 5 SYDE1 2 TGFB1I1 5SPON1 23171 DES 2 SMU1 4 IGF1 5 PPP1R14A 2 TGFB1I1 5 TGFBR3 23018 DES 2TSPYL1 4 IGF1 5 SMOC1 2 TGFB1I1 5 CACHD1 22496 DES 2 SPRED1 4 IGF1 5JPH2 1 TGFB1I1 5 TPM2 22108 DES 2 LOC100128997 4 IGF1 5 MICALL1 1TGFB1I1 5 GSN 22102 DES 2 LOC729652 4 IGF1 5 LCAT 1 TGFB1I1 5 NID2 21240DES 2 TRAPPC2 4 IGF1 5 HSPB6 1 TGFB1I1 5 MYOCD 21178 DES 2 KCTD10 4 IGF11 FLNA 33418 TPM2 5 MKX 20028 DES 2 DUSP19 4 IGF1 1 TAGLN 33391 TPM2 5EYA4 19967 DES 2 CCDC122 4 IGF1 1 TNS1 32975 TPM2 5 LOC100127983 18208DES 2 NXN 4 IGF1 1 CNN1 32489 TPM2 5 ANXA6 16600 DES 2 ZNF283 4 IGF1 1CHRDL1 31765 TPM2 5 HLF 16262 DES 2 SPATS2L 4 IGF1 1 LMOD1 31568 TPM2 5VWA5A 16175 DES 2 TRIM5 4 IGF1 1 MYLK 31444 TPM2 5 SRD5A2 16145 DES 2HAUS3 4 IGF1 1 ACTA2 31310 TPM2 5 SYNM 15943 DES 2 UTP11L 4 IGF1 1 ACTG230665 TPM2 5 CDC42EP3 14001 DES 2 SLC30A5 4 IGF1 1 KCNMB1 30331 TPM2 5AOC3 13787 DES 2 MBOAT1 4 IGF1 1 MSRB3 30007 TPM2 5 TIMP2 13760 DES 2TERF2 4 IGF1 1 SORBS1 29926 TPM2 5 ILK 13444 DES 2 VPS33A 4 IGF1 1DPYSL3 29802 TPM2 5 ADCY5 13346 DES 2 SENP5 4 IGF1 1 DES 29158 TPM2 5PARVA 13266 DES 2 EVI5 4 IGF1 1 VCL 29088 TPM2 5 FBLN1 12617 DES 2NDUFC2 4 IGF1 1 SLC8A1 29075 TPM2 5 LOC645954 12259 DES 2 ZBTB8A 4 IGF11 CCND2 28780 TPM2 5 FAT4 12247 DES 2 ST8SIA4 4 IGF1 1 MEIS1 28764 TPM25 ITIH5 11490 DES 2 C7orf64 4 IGF1 1 PGM5 28584 TPM2 5 COL6A3 10595 DES2 MED18 4 IGF1 1 ATP2B4 28495 TPM2 5 TSHZ3 10118 DES 2 MPV17L 4 IGF1 1LOC729468 28204 TPM2 5 MCAM 8671 DES 2 C1orf210 4 IGF1 1 FHL1 28101 TPM25 MAP1B 8478 DES 2 LIN7C 4 IGF1 1 FLNC 27926 TPM2 5 WFDC1 7000 DES 2KCNJ11 4 IGF1 1 PGM5P2 27789 TPM2 5 PDE5A 6648 DES 2 COX18 4 IGF1 1HSPB8 27438 TPM2 5 TLN1 5948 DES 2 PCBD2 4 IGF1 1 DDR2 26679 TPM2 5PDLIM7 5715 DES 2 SPAST 4 IGF1 1 PGR 26409 TPM2 5 SPOCK3 5657 DES 2CYP4V2 4 IGF1 1 MRVI1 25979 TPM2 5 BOC 5611 DES 2 LRTOMT 4 IGF1 1 DKK325603 TPM2 5 CRYAB 5555 DES 2 IMPAD1 3 IGF1 1 RBPMS 24576 TPM2 5 PMP224795 DES 2 UBXN2B 3 IGF1 1 MYH11 24353 TPM2 5 ADRA1A 4611 DES 2 C5orf333 IGF1 1 FZD7 24298 TPM2 5 FGF2 4439 DES 2 FOXJ3 3 IGF1 1 TPM2 23458TPM2 5 CELF2 4392 DES 2 PPP1R15B 3 IGF1 1 GNAL 23091 TPM2 5 MMP2 4243DES 2 GNAI3 2 IGF1 1 MYL9 22987 TPM2 5 WWTR1 3966 DES 2 SAR1B 2 IGF1 1JAZF1 21665 TPM2 5 CAP2 3592 DES 2 SERPINB9 2 IGF1 1 CAV1 21569 TPM2 5LOC100129846 3236 DES 2 PTGIS 2 IGF1 1 KANK2 21564 TPM2 5 RBMS3 3165 DES2 C3orf70 2 IGF1 1 EDNRA 20876 TPM2 5 AOX1 3042 DES 2 RUNDC2B 2 IGF1 1SPARCL1 20468 TPM2 5 MFAP4 3011 DES 2 SYT11 1 IGF1 1 TRPC4 19698 TPM2 5TCF21 2881 DES 1 CPXM2 1 ITGA7 1 TSPAN18 18763 TPM2 5 MATN2 2851 DES 1MRVI1 1 ITGA7 1 ACTN1 18284 TPM2 5 MRGPRF 2724 DES 1 ITGA7 1 ITGA7 1TIMP3 18017 TPM2 5 POPDC2 2704 DES 2 ADCY5 661 ITGA7 1 ABCC9 17793 TPM25 CFL2 2404 DES 2 MRGPRF 652 ITGA7 1 SYNE1 17659 TPM2 5 LOC283904 2374DES 2 PDLIM7 649 ITGA7 1 SERPINF1 17306 TPM2 5 PRELP 2253 DES 2 FLNC 627ITGA7 1 PALLD 16659 TPM2 5 CCDC69 2088 DES 2 KANK2 624 ITGA7 1 PRICKLE216570 TPM2 5 PLN 2046 DES 2 MYL9 611 ITGA7 1 CSRP1 15853 TPM2 5 DNAJB51956 DES 2 AOC3 602 ITGA7 1 HEPH 14646 TPM2 5 GPR124 1851 DES 2 FLNA 540ITGA7 1 NEXN 13548 TPM2 5 GAS6 1830 DES 2 TAGLN 527 ITGA7 1 MYOCD 13479TPM2 5 TSPAN2 1830 DES 2 KCNMB1 492 ITGA7 1 MEIS2 13043 TPM2 5 ANGPT11797 DES 2 DES 491 ITGA7 1 TPM1 12988 TPM2 5 MFGE8 1766 DES 2 ITGA7 481ITGA7 1 SPON1 12334 TPM2 5 ITGA1 1682 DES 2 SLC24A3 434 ITGA7 1 EYA412112 TPM2 5 GSTM5 1596 DES 2 TNS1 423 ITGA7 1 HLF 11972 TPM2 5 MYADM1579 DES 2 TSPAN18 364 ITGA7 1 SYNM 11833 TPM2 5 CES1 1511 DES 2 MCAM351 ITGA7 1 SVIL 11249 TPM2 5 CAMK2G 1453 DES 2 TPM2 322 ITGA7 1 FRMD610974 TPM2 5 PCP4 1361 DES 2 MYLK 322 ITGA7 1 CNTN1 10796 TPM2 5 SLC24A31275 DES 2 HSPB8 317 ITGA7 1 CLU 10687 TPM2 5 RGN 1215 DES 2 MYH11 317ITGA7 1 LOC100127983 10582 TPM2 5 KCNMA1 1050 DES 2 MRVI1 314 ITGA7 1PRNP 10088 TPM2 5 PDZRN4 876 DES 2 LMOD1 301 ITGA7 1 MKX 9903 TPM2 5ARHGAP10 867 DES 2 CNN1 288 ITGA7 1 CALD1 9712 TPM2 5 C6orf186 841 DES 2ITIH5 287 ITGA7 1 FERMT2 9315 TPM2 5 ARHGAP20 828 DES 2 DNAJB5 282 ITGA71 NID2 9290 TPM2 5 FXYD6 826 DES 2 CHRDL1 264 ITGA7 1 ITIH5 8936 TPM2 5PTGER2 802 DES 2 EFEMP2 256 ITGA7 1 PDGFC 8919 TPM2 5 SLC12A4 721 DES 2ATP1A2 239 ITGA7 1 LOC732446 8793 TPM2 5 NID1 670 DES 2 SMTN 238 ITGA7 1LOC645954 8764 TPM2 5 ITGA9 568 DES 2 GAS6 231 ITGA7 1 ADCY5 8698 TPM2 5SMTN 558 DES 2 WFDC1 222 ITGA7 1 AOC3 8557 TPM2 5 TCEAL2 557 DES 2TGFB1I1 220 ITGA7 1 SRD5A2 8415 TPM2 5 COL6A1 499 DES 2 GPR124 206 ITGA71 GSN 7427 TPM2 5 ITGA5 475 DES 2 NID2 204 ITGA7 1 WFDC1 6345 TPM2 5ATP1A2 417 DES 2 ADRA1A 197 ITGA7 1 VWA5A 6297 TPM2 5 C21orf63 408 DES 2PYGM 189 ITGA7 1 ILK 6243 TPM2 5 EFEMP2 389 DES 2 RASL12 186 ITGA7 1TGFBR3 5718 TPM2 5 PTPLA 366 DES 2 BOC 184 ITGA7 1 CDC42EP3 5544 TPM2 5ST5 364 DES 2 FZD7 174 ITGA7 1 TSHZ3 5478 TPM2 5 JAM3 350 DES 2 ACTG2172 ITGA7 1 FAT4 4923 TPM2 5 ITGA7 333 DES 2 PRICKLE2 157 ITGA7 1 PARVA4922 TPM2 5 LPP 320 DES 2 GEFT 156 ITGA7 1 MCAM 4880 TPM2 5 COL6A2 302DES 2 COL6A1 142 ITGA7 1 PDLIM7 4753 TPM2 5 ODZ3 294 DES 2 PGM5 133ITGA7 1 ADRA1A 4540 TPM2 5 PLEKHO1 266 DES 2 SYNM 132 ITGA7 1 ANXA6 4499TPM2 5 PYGM 249 DES 2 FHL1 126 ITGA7 1 FBLN1 4133 TPM2 5 TINAGL1 239 DES2 HEPH 112 ITGA7 1 BOC 3515 TPM2 5 PCDH10 238 DES 2 COL6A2 110 ITGA7 1COL6A3 3490 TPM2 5 PNMA1 232 DES 2 LOC729468 109 ITGA7 1 CRYAB 3436 TPM25 ACACB 221 DES 2 MYOCD 101 ITGA7 1 SPOCK3 3141 TPM2 5 RASL12 213 DES 2ACTA2 66 ITGA7 1 PDE5A 2530 TPM2 5 LARGE 182 DES 2 RBPMS2 62 ITGA7 1MAP1B 2406 TPM2 5 GEFT 181 DES 2 LIMS2 53 ITGA7 1 FGF2 2375 TPM2 5 NCS1176 DES 2 GNAO1 45 ITGA7 1 LOC100129846 2231 TPM2 5 TRANK1 173 DES 2ASB2 44 ITGA7 1 MRGPRF 2029 TPM2 5 FGFR1 166 DES 2 HRNBP3 43 ITGA7 1DNAJB5 2029 TPM2 5 AHNAK2 164 DES 2 POPDC2 41 ITGA7 1 LOC283904 2007TPM2 5 LGALS1 156 DES 2 DAAM2 38 ITGA7 1 POPDC2 1965 TPM2 5 RRAS 133 DES2 ODZ3 34 ITGA7 1 TCF21 1785 TPM2 5 C2orf40 132 DES 2 PDZRN4 33 ITGA7 1TLN1 1720 TPM2 5 TGFB1I1 126 DES 2 C6orf186 30 ITGA7 1 CELF2 1700 TPM2 5RAB34 95 DES 2 ITGA9 28 ITGA7 1 AOX1 1459 TPM2 5 PTRF 94 DES 2 NID1 27ITGA7 1 SLC24A3 1296 TPM2 5 SCHIP1 91 DES 2 C16orf45 22 ITGA7 1 CCDC691287 TPM2 5 GSTM2 87 DES 2 RUSC2 22 ITGA7 1 ANGPT1 1256 TPM2 5 MAOB 49DES 2 TMEM35 19 ITGA7 1 PCP4 1226 TPM2 5 MASP1 48 DES 2 CLIP3 19 ITGA7 1BNC2 1170 TPM2 5 TRIP10 45 DES 2 MRC2 19 ITGA7 1 PDZRN4 1069 TPM2 5RARRES2 40 DES 2 TINAGL1 17 ITGA7 1 RGN 1065 TPM2 5 RBPMS2 37 DES 2DBNDD2 17 ITGA7 1 CES1 1060 TPM2 5 APOBEC3C 30 DES 2 ITGB3 11 ITGA7 1GPR124 917 TPM2 5 COPZ2 29 DES 2 LDB3 9 ITGA7 1 GAS6 888 TPM2 5 CACNA1C21 DES 2 ITGA5 9 ITGA7 1 CFL2 871 TPM2 5 GNAO1 16 DES 2 NCS1 9 ITGA7 1CAMK2G 869 TPM2 5 UST 12 DES 2 FOXF1 8 ITGA7 1 ARHGAP20 850 TPM2 5 ACTC112 DES 2 DACT1 7 ITGA7 1 GSTM5 794 TPM2 5 CES4 11 DES 2 CSPG4 6 ITGA7 1CAP2 752 TPM2 5 ID4 11 DES 2 JPH2 6 ITGA7 1 PRELP 693 TPM2 5 C16orf45 10DES 2 ZNF516 6 ITGA7 1 SMTN 540 TPM2 5 LIMS2 9 DES 2 KIRREL 3 ITGA7 1FXYD6 533 TPM2 5 GSTM5P1 6 DES 2 NHSL2 3 ITGA7 1 TSPAN2 500 TPM2 5 GSTM45 DES 2 LCAT 2 ITGA7 1 KCNMA1 488 TPM2 5 CBX7 3 DES 2 FABP3 2 ITGA7 1PTGER2 429 TPM2 5 PPP1R14A 3 DES 2 GNAZ 1 ITGA7 1 TCEAL2 425 TPM2 5FABP3 3 DES 2 P2RX1 1 ITGA7 1 MYADM 402 TPM2 5 GSTM1 2 DES 1 SLC8A147139 SRD5A2 1 JAM3 360 TPM2 5 GSTM2P1 2 DES 1 LOC729468 47056 SRD5A2 1COL6A1 354 TPM2 5 HSPB6 1 DES 1 DPYSL3 47002 SRD5A2 1 ATP1A2 339 TPM2 1GSTM5P1 4 GSTM1 1 ACTA2 46967 SRD5A2 1 SLC12A4 327 TPM2 1 GSTM2 4 GSTM11 PGM5 46874 SRD5A2 1 ITGA5 325 TPM2 1 GSTM4 4 GSTM1 1 MEIS1 46871SRD5A2 1 ITGA9 301 TPM2 1 GSTM5 4 GSTM1 1 ACTG2 46703 SRD5A2 1 ITGA7 300TPM2 1 SPOCK3 3 GSTM1 1 PGM5P2 46699 SRD5A2 1 EFEMP2 298 TPM2 1 PGM5 3GSTM1 1 MSRB3 46428 SRD5A2 1 PYGM 254 TPM2 1 HSPB8 3 GSTM1 1 TAGLN 46404SRD5A2 1 COL6A2 248 TPM2 1 AOX1 2 GSTM1 1 FLNA 46365 SRD5A2 1 ARHGAP10227 TPM2 1 CSRP1 2 GSTM1 1 VCL 46278 SRD5A2 1 PNMA1 211 TPM2 1 FLNC 2GSTM1 1 CNN1 45892 SRD5A2 1 RASL12 207 TPM2 1 DES 2 GSTM1 1 CHRDL1 45879SRD5A2 1 GEFT 194 TPM2 1 GSTM2P1 2 GSTM1 1 TNS1 45774 SRD5A2 1 PTPLA 183TPM2 1 GSTM1 2 GSTM1 1 ATP2B4 45519 SRD5A2 1 CRISPLD2 181 TPM2 1 CAV1 1GSTM1 1 LMOD1 44299 SRD5A2 1 ACSS3 177 TPM2 1 SRD5A2 1 GSTM1 1 PGR 44126SRD5A2 1 AHNAK2 175 TPM2 1 GSTM3 1 GSTM1 1 SORBS1 43839 SRD5A2 1 ST5 175TPM2 1 LOC729468 1 GSTM1 1 CCND2 43401 SRD5A2 1 PLEKHO1 165 TPM2 1 EYA41 GSTM1 1 DDR2 43389 SRD5A2 1 LARGE 164 TPM2 1 PGM5P2 1 GSTM1 1 EDNRA42947 SRD5A2 1 C21orf63 151 TPM2 1 CSRP1 364 GSTM2 1 FHL1 41382 SRD5A2 1TINAGL1 150 TPM2 1 CAV1 358 GSTM2 1 KCNMB1 41204 SRD5A2 1 ACACB 138 TPM21 TNS1 358 GSTM2 1 TRPC4 40884 SRD5A2 1 LGALS1 136 TPM2 1 ATP2B4 356GSTM2 1 SYNE1 40118 SRD5A2 1 TGFB1I1 126 TPM2 1 MEIS2 352 GSTM2 1 CAV139836 SRD5A2 1 ITGB3 123 TPM2 1 FLNA 350 GSTM2 1 SPARCL1 39359 SRD5A2 1RRAS 123 TPM2 1 TAGLN 350 GSTM2 1 RBPMS 38414 SRD5A2 1 NCS1 107 TPM2 1GNAL 350 GSTM2 1 FZD7 34246 SRD5A2 1 PTRF 94 TPM2 1 DPYSL3 348 GSTM2 1SRD5A2 33968 SRD5A2 1 LPP 91 TPM2 1 MEIS1 347 GSTM2 1 DKK3 33963 SRD5A21 C2orf40 90 TPM2 1 TRPC4 345 GSTM2 1 JAZF1 33635 SRD5A2 1 MAOB 59 TPM21 CCND2 325 GSTM2 1 MYLK 33158 SRD5A2 1 GSTM2 51 TPM2 1 SYNE1 321 GSTM21 ABCC9 33072 SRD5A2 1 TRIP10 48 TPM2 1 EDNRA 317 GSTM2 1 GNAL 32392SRD5A2 1 ALDH1A2 43 TPM2 1 ACTA2 313 GSTM2 1 PALLD 31713 SRD5A2 1RARRES2 36 TPM2 1 PALLD 310 GSTM2 1 FLNC 29309 SRD5A2 1 COPZ2 34 TPM2 1FRMD6 309 GSTM2 1 PRICKLE2 29168 SRD5A2 1 APOBEC3C 34 TPM2 1 PGM5 301GSTM2 1 MRVI1 28467 SRD5A2 1 RBPMS2 33 TPM2 1 HSPB8 293 GSTM2 1 TIMP328313 SRD5A2 1 DBNDD2 31 TPM2 1 ACTG2 287 GSTM2 1 FRMD6 28108 SRD5A2 1GNAO1 19 TPM2 1 CNN1 286 GSTM2 1 PRNP 28106 SRD5A2 1 ACTC1 15 TPM2 1PGM5P2 285 GSTM2 1 HSPB8 26756 SRD5A2 1 CES4 11 TPM2 1 SLC8A1 282 GSTM21 PDGFC 26571 SRD5A2 1 C16orf45 10 TPM2 1 LOC729468 276 GSTM2 1 CNTN126148 SRD5A2 1 GSTP1 8 TPM2 1 PRICKLE2 275 GSTM2 1 EYA4 26129 SRD5A2 1UST 5 TPM2 1 SRD5A2 275 GSTM2 1 MEIS2 25616 SRD5A2 1 GSTM4 4 TPM2 1RBPMS 275 GSTM2 1 MYOCD 25477 SRD5A2 1 GSTM5P1 4 TPM2 1 PDGFC 270 GSTM21 NEXN 25068 SRD5A2 1 CBX7 3 TPM2 1 EYA4 270 GSTM2 1 CACHD1 25049 SRD5A21 PPP1R14A 3 TPM2 1 MYOCD 262 GSTM2 1 FERMT2 24208 SRD5A2 1 FABP3 3 TPM21 CALD1 255 GSTM2 1 LOC100127983 23635 SRD5A2 1 C15orf51 1 TPM2 1 KCNMB1250 GSTM2 1 TPM1 22943 SRD5A2 1 GSTM2P1 1 TPM2 1 ACTN1 227 GSTM2 1 CALD122765 SRD5A2 1 FZD7 216 GSTM2 1 SERPINF1 22186 SRD5A2 1 LOC100127983 216GSTM2 1 CSRP1 21728 SRD5A2 1 DKK3 209 GSTM2 1 ACTN1 21590 SRD5A2 1 GSTM5207 GSTM2 1 HLF 21402 SRD5A2 1 CHRDL1 204 GSTM2 1 DES 20952 SRD5A2 1SORBS1 202 GSTM2 1 MYL9 19970 SRD5A2 1 SPOCK3 202 GSTM2 1 HEPH 19688SRD5A2 1 JAZF1 189 GSTM2 1 TSPAN18 19099 SRD5A2 1 LMOD1 180 GSTM2 1 SVIL18819 SRD5A2 1 DES 172 GSTM2 1 TGFBR3 18423 SRD5A2 1 MYH11 18066 SRD5A21 KANK2 17638 SRD5A2 1 CDC42EP3 16173 SRD5A2

TABLE 10 StackID Coexpressed Gene ProbeWt Seeding Gene 1 NCAPH 9758CDC20 1 CDC20 9758 CDC20 1 IQGAP3 9758 CDC20 1 ESPL1 9674 CDC20 1 CENPA9671 CDC20 1 POC1A 9671 CDC20 1 KIF18B 9328 CDC20 1 WDR62 9316 CDC20 1TROAP 9178 CDC20 1 ADAMTS7 8987 CDC20 1 PKMYT1 8875 CDC20 1 SLC2A6 8875CDC20 1 FNDC1 8554 CDC20 1 FAM64A 8346 CDC20 1 FAM131B 8322 CDC20 1PNLDC1 8135 CDC20 1 KIFC1 7598 CDC20 1 C9orf100 7547 CDC20 1 RPS6KL17527 CDC20 1 MRAP 7521 CDC20 1 AURKB 7224 CDC20 1 C2orf54 7150 CDC20 1TMEM163 6853 CDC20 1 KRBA1 6846 CDC20 1 ZMYND10 6825 CDC20 1 LOC5414736824 CDC20 1 SLC6A1 6669 CDC20 1 DQX1 6601 CDC20 1 BAI2 6583 CDC20 1EME1 6533 CDC20 1 CICP3 6481 CDC20 1 PPFIA4 6480 CDC20 1 PADI1 6458CDC20 1 SSPO 6431 CDC20 1 GABRB2 6422 CDC20 1 IRF5 6399 CDC20 1 NXPH16399 CDC20 1 ZIC1 6346 CDC20 1 SLC6A20 6336 CDC20 1 PKD1L1 6281 CDC20 1BIRC5 6278 CDC20 1 AQP10 6251 CDC20 1 ABCA4 6216 CDC20 1 TFR2 6181 CDC201 LOC646070 6179 CDC20 1 CSPG5 6165 CDC20 1 CENPM 6124 CDC20 1 EFNA36100 CDC20 1 GPC2 6078 CDC20 1 HYAL3 6047 CDC20 1 CELA3B 6031 CDC20 1LOC100287112 6015 CDC20 1 SRCRB4D 5999 CDC20 1 DNAJB3 5993 CDC20 1 PADI35989 CDC20 1 PAX8 5971 CDC20 1 AIM1L 5971 CDC20 1 FAM131C 5915 CDC20 1PRRT4 5915 CDC20 1 MLXIPL 5915 CDC20 1 E2F1 5912 CDC20 1 E2F7 5893 CDC201 RAD54L 5888 CDC20 1 C1orf81 5813 CDC20 1 NFKBIL2 5742 CDC20 1LOC729061 5728 CDC20 1 TAS1R3 5671 CDC20 1 VWA3B 5643 CDC20 1 MYBL2 5565CDC20 1 TTLL6 5531 CDC20 1 LOC100130097 5525 CDC20 1 CHRNG 5491 CDC20 1TTBK1 5491 CDC20 1 TRIM46 5491 CDC20 1 MST1R 5491 CDC20 1 EXOC3L 5474CDC20 1 TH 5474 CDC20 1 CHST1 5474 CDC20 1 LOC442676 5439 CDC20 1 CNTN25435 CDC20 1 DPYSL5 5435 CDC20 1 C3orf20 5368 CDC20 1 NPC1L1 5291 CDC201 CICP5 5281 CDC20 1 KLRG2 5275 CDC20 1 CCDC108 5275 CDC20 1 IL28B 5217CDC20 1 CELSR3 5166 CDC20 1 RNFT2 5138 CDC20 1 C17orf53 5114 CDC20 1TRPC2 5095 CDC20 1 KCNA1 5078 CDC20 1 C8G 4946 CDC20 1 COL11A1 4685CDC20 1 C1orf222 4673 CDC20 1 SLC6A12 4633 CDC20 1 HCN3 4608 CDC20 1GTSE1 4528 CDC20 1 ORC1L 4497 CDC20 1 STX1A 4475 CDC20 1 MFSD2A 4451CDC20 1 BEST4 4389 CDC20 1 CACNA1E 4299 CDC20 1 KLHDC7A 4297 CDC20 1MAPK15 4272 CDC20 1 GHRHR 4211 CDC20 1 KEL 4155 CDC20 1 C2orf62 4113CDC20 1 ANXA9 4063 CDC20 1 RAET1G 4059 CDC20 1 GPR88 3913 CDC20 1 F123749 CDC20 1 LYPD1 3681 CDC20 1 C2orf70 3665 CDC20 1 ABCB9 3638 CDC20 1MSLNL 3589 CDC20 1 CDC25C 3573 CDC20 1 CELA3A 3551 CDC20 1 AQP12B 3551CDC20 1 NEU4 3551 CDC20 1 KIF2C 3541 CDC20 1 NEIL3 3426 CDC20 1 NUDT173399 CDC20 1 ULBP2 3395 CDC20 1 KIF17 3341 CDC20 1 ARHGEF19 3340 CDC20 1CYP4A22 3317 CDC20 1 CYP4A11 3317 CDC20 1 SCNN1D 3311 CDC20 1 FRMD1 3219CDC20 1 FAM179A 3194 CDC20 1 NDUFA4L2 3109 CDC20 1 LCE2D 2984 CDC20 1ODZ4 2936 CDC20 1 ABCC12 2809 CDC20 1 DPF1 2750 CDC20 1 CDH24 2653 CDC201 LOC154449 2641 CDC20 1 KIF21B 2534 CDC20 1 SEMA5B 2499 CDC20 1PSORS1C2 2497 CDC20 1 FCRL4 2434 CDC20 1 FUT6 2313 CDC20 1 TRAIP 2258CDC20 1 E2F8 2232 CDC20 1 SLC38A3 2199 CDC20 1 CBX2 2174 CDC20 1 CDCA52130 CDC20 1 DUSP5P 2080 CDC20 1 GPAT2 1997 CDC20 1 AVPR1B 1991 CDC20 1MGC50722 1990 CDC20 1 AQP12A 1983 CDC20 1 C6orf222 1965 CDC20 1 PRAMEF191965 CDC20 1 PRAMEF18 1965 CDC20 1 SLC5A9 1965 CDC20 1 FCN3 1955 CDC20 1GCM2 1910 CDC20 1 ADORA3 1862 CDC20 1 PLA2G2F 1821 CDC20 1 C6orf25 1765CDC20 1 CDC45 1681 CDC20 1 AGXT 1529 CDC20 1 KIF25 1507 CDC20 1 ZDHHC191507 CDC20 1 APLNR 1374 CDC20 1 TACC3 1220 CDC20 1 TK1 1063 CDC20 1C15orf42 1052 CDC20 1 FANCA 990 CDC20 1 GINS4 932 CDC20 1 MCM10 757CDC20 1 CYB561D1 748 CDC20 1 FUT5 687 CDC20 1 POLQ 632 CDC20 1 LOC643988621 CDC20 1 RAD51 601 CDC20 1 DGAT2 582 CDC20 1 KIF24 566 CDC20 1 CDCA3442 CDC20 1 CLSPN 381 CDC20 1 ESYT3 356 CDC20 1 EXO1 278 CDC20 1 CDCA2186 CDC20 1 CKAP2L 159 CDC20 1 FOXM1 157 CDC20 1 FEN1 136 CDC20 1 UHRF1125 CDC20 1 KIF20A 110 CDC20 1 ESCO2 107 CDC20 1 CA2 100 CDC20 1 PLK1 84CDC20 1 PTTG1 64 CDC20 1 KIF14 53 CDC20 1 CIT 42 CDC20 1 FAM54A 39 CDC201 CDCA8 28 CDC20 1 DEPDC1B 12 CDC20 1 MYBL2 12208 MYBL2 1 BIRC5 12208MYBL2 1 TROAP 12208 MYBL2 1 ESPL1 12190 MYBL2 1 WDR62 12032 MYBL2 1KIF18B 12015 MYBL2 1 FAM64A 11915 MYBL2 1 PKMYT1 11774 MYBL2 1 SLC2A611774 MYBL2 1 GTSE1 11356 MYBL2 1 E2F1 11062 MYBL2 1 AURKB 11010 MYBL2 1RNFT2 10720 MYBL2 1 CENPM 10651 MYBL2 1 CENPA 10628 MYBL2 1 POC1A 10289MYBL2 1 FDXR 10285 MYBL2 1 NFKBIL2 10214 MYBL2 1 E2F7 10195 MYBL2 1C9orf100 10103 MYBL2 1 CDH24 10094 MYBL2 1 ABCB9 10079 MYBL2 1 NDUFA4L29961 MYBL2 1 ADAMTS7 9614 MYBL2 1 MAST1 9313 MYBL2 1 GABBR2 9262 MYBL2 1MYH7B 8759 MYBL2 1 DNAH3 8637 MYBL2 1 TTLL6 8619 MYBL2 1 ZFHX2 8592MYBL2 1 CDC20 8589 MYBL2 1 RASAL1 8452 MYBL2 1 NCAPH 8273 MYBL2 1 IQGAP38245 MYBL2 1 DNAH2 8219 MYBL2 1 LOC400499 8151 MYBL2 1 CHST1 8037 MYBL21 ATP4A 7868 MYBL2 1 TH 7731 MYBL2 1 EXOC3L 7603 MYBL2 1 E2F8 7590 MYBL21 MMP11 7465 MYBL2 1 CELP 7344 MYBL2 1 CDCA5 7024 MYBL2 1 FAM131B 6981MYBL2 1 C14orf73 6896 MYBL2 1 FBXW9 6802 MYBL2 1 PLEKHG6 6725 MYBL2 1FNDC1 6720 MYBL2 1 SEZ6 6515 MYBL2 1 FCHO1 6413 MYBL2 1 APLNR 6402 MYBL21 ALAS2 6382 MYBL2 1 VSX1 6360 MYBL2 1 LOC197350 6312 MYBL2 1 DPF1 6205MYBL2 1 CDC45 6026 MYBL2 1 C11orf9 6020 MYBL2 1 EME1 6010 MYBL2 1ADAMTS13 5896 MYBL2 1 TMEM145 5896 MYBL2 1 C8G 5840 MYBL2 1 CBX2 5838MYBL2 1 TMEM210 5659 MYBL2 1 CCDC135 5593 MYBL2 1 ADAMTS14 5571 MYBL2 1ITGA2B 5337 MYBL2 1 POLD1 5286 MYBL2 1 PNLDC1 5146 MYBL2 1 UCP3 5123MYBL2 1 FANCA 5068 MYBL2 1 MSLNL 5061 MYBL2 1 TEPP 4930 MYBL2 1 LRRC16B4901 MYBL2 1 CACNA1F 4901 MYBL2 1 EFNB3 4887 MYBL2 1 MYBPC2 4851 MYBL2 1FUT6 4847 MYBL2 1 CDH15 4847 MYBL2 1 HAL 4809 MYBL2 1 PGA3 4720 MYBL2 1PGA4 4720 MYBL2 1 C17orf53 4717 MYBL2 1 UMODL1 4713 MYBL2 1 OTOG 4690MYBL2 1 DBH 4661 MYBL2 1 POM121L9P 4629 MYBL2 1 DNAJB13 4394 MYBL2 1 TK14360 MYBL2 1 C9orf117 4336 MYBL2 1 RHBDL1 4308 MYBL2 1 MUC5B 4283 MYBL21 SPAG4 4276 MYBL2 1 GOLGA7B 4111 MYBL2 1 APOB48R 4107 MYBL2 1 IQCD 3984MYBL2 1 FUT5 3977 MYBL2 1 AIFM3 3973 MYBL2 1 LOC390595 3868 MYBL2 1CYP27B1 3833 MYBL2 1 SUSD2 3824 MYBL2 1 TGM6 3767 MYBL2 1 CDCA3 3765MYBL2 1 C20orf151 3706 MYBL2 1 C11orf41 3650 MYBL2 1 C9orf98 3636 MYBL21 KRT24 3589 MYBL2 1 ABCC12 3582 MYBL2 1 B3GNT4 3569 MYBL2 1 AZI1 3556MYBL2 1 RLTPR 3427 MYBL2 1 KIF24 3264 MYBL2 1 DERL3 3232 MYBL2 1 LIPE3221 MYBL2 1 TTLL9 3196 MYBL2 1 SEC1 3196 MYBL2 1 ADAM8 3185 MYBL2 1SLC25A19 3136 MYBL2 1 PRSS27 3136 MYBL2 1 ODF3L2 3094 MYBL2 1 ODZ4 3034MYBL2 1 RAD54L 2936 MYBL2 1 KCNE1L 2936 MYBL2 1 SBF1P1 2915 MYBL2 1AIPL1 2868 MYBL2 1 UNC13A 2862 MYBL2 1 REM2 2832 MYBL2 1 KIFC1 2808MYBL2 1 TSNAXIP1 2799 MYBL2 1 LOC390660 2767 MYBL2 1 SLC6A12 2762 MYBL21 WDR16 2723 MYBL2 1 ACR 2710 MYBL2 1 TMPRSS13 2672 MYBL2 1 C15orf422659 MYBL2 1 DNMT3B 2649 MYBL2 1 UNC13D 2610 MYBL2 1 SYT5 2544 MYBL2 1PAX2 2462 MYBL2 1 PRCD 2426 MYBL2 1 PPFIA3 2421 MYBL2 1 GCGR 2338 MYBL21 CACNG3 2289 MYBL2 1 LAIR2 2233 MYBL2 1 MCM10 2178 MYBL2 1 C2orf54 2172MYBL2 1 LOC400419 2138 MYBL2 1 RINL 2136 MYBL2 1 DKFZp451A211 2118 MYBL21 LAMA1 2060 MYBL2 1 C9orf169 2060 MYBL2 1 CATSPER1 2001 MYBL2 1 OPCML1896 MYBL2 1 C9orf50 1852 MYBL2 1 DOC2GP 1760 MYBL2 1 TACC3 1665 MYBL2 1APOBEC3A 1632 MYBL2 1 LOC728307 1606 MYBL2 1 PDIA2 1572 MYBL2 1 LTB4R21419 MYBL2 1 OIP5 1393 MYBL2 1 ORC1L 1340 MYBL2 1 GSG2 1268 MYBL2 1 FSD11256 MYBL2 1 CDC25C 1228 MYBL2 1 KSR2 1183 MYBL2 1 DGAT2 1183 MYBL2 1KIF2C 1180 MYBL2 1 RAD51 1178 MYBL2 1 FNDC8 1178 MYBL2 1 RAB3IL1 991MYBL2 1 UHRF1 936 MYBL2 1 ENO4 855 MYBL2 1 C10orf105 780 MYBL2 1 NEIL3733 MYBL2 1 PPBP 672 MYBL2 1 PROCA1 671 MYBL2 1 TMEM132A 647 MYBL2 1DHRS2 548 MYBL2 1 PLK1 523 MYBL2 1 GINS4 485 MYBL2 1 CEL 480 MYBL2 1ZNF367 406 MYBL2 1 FOXM1 402 MYBL2 1 POLQ 319 MYBL2 1 ADAM12 312 MYBL2 1SEMA7A 284 MYBL2 1 HOXB5 137 MYBL2 1 EXO1 115 MYBL2 1 KIF4A 114 MYBL2 1FEN1 112 MYBL2 1 CLSPN 107 MYBL2 1 CIT 94 MYBL2 1 CDCA2 85 MYBL2 1 KIF4B68 MYBL2 1 PIK3R5 56 MYBL2 1 KIF20A 52 MYBL2 1 ZWINT 31 MYBL2 1 SPAG5 19MYBL2 1 ERCC6L 17 MYBL2 1 TPX2 11 TPX2 1 TOP2A 11 TPX2 1 NUSAP1 10 TPX21 MELK 7 TPX2 1 RACGAP1 6 TPX2 1 NCAPG 4 TPX2 1 MKI67 4 TPX2 1 CDKN3 4TPX2 1 PRC1 4 TPX2 1 ARHGAP11B 3 TPX2 1 KIAA0101 3 TPX2 1 ANLN 3 TPX2 1FAM111B 2 TPX2 1 RRM2 1 TPX2 1 KIF11 1 TPX2 1 PRR11 1 TPX2 1 CENPF 1TPX2 2 MKI67 41 TPX2 2 CASC5 39 TPX2 2 ASPM 38 TPX2 2 KIF4A 36 TPX2 2DLGAP5 36 TPX2 2 KIF4B 36 TPX2 2 TPX2 33 TPX2 2 KIF14 31 TPX2 2 EXO1 31TPX2 2 SKA3 30 TPX2 2 SPAG5 27 TPX2 2 CIT 27 TPX2 2 BUB1 26 TPX2 2 CDKN326 TPX2 2 CENPF 25 TPX2 2 MELK 20 TPX2 2 ANLN 19 TPX2 2 BUB1B 18 TPX2 2UBE2C 17 TPX2 2 CEP55 16 TPX2 2 KIF20A 15 TPX2 2 DEPDC1B 15 TPX2 2 DTL14 TPX2 2 UBE2T 13 TPX2 2 NCAPG 13 TPX2 2 PBK 13 TPX2 2 DIAPH3 10 TPX2 2KIF23 6 TPX2 2 FOXM1 5 TPX2 2 RRM2 3 TPX2 2 SGOL1 2 TPX2 2 PLK1 2 TPX2 2CCNA2 2 TPX2 2 CDK1 2 TPX2 2 NUSAP1 1 TPX2

TABLE 11 StackID Coexpressed Gene ProbeWt Seeding Gene 1 NNT 1 DUSP1 1RNF180 1 DUSP1 1 PCDH18 1 DUSP1 2 RNF180 1 DUSP1 2 DUSP1 1 DUSP1 2PCDH18 1 DUSP1 3 ACTB 1 DUSP1 3 RHOB 1 DUSP1 3 DUSP1 1 DUSP1 4 ACTB 1DUSP1 4 DUSP1 1 DUSP1 4 CRTAP 1 DUSP1 5 RNF180 1 DUSP1 5 DUSP1 1 DUSP1 5CRTAP 1 DUSP1 5 PAM 1 DUSP1 6 DUSP1 8 DUSP1 6 NR4A1 7 DUSP1 6 FOS 7DUSP1 6 EGR1 5 DUSP1 6 BTG2 5 DUSP1 6 FOSB 5 DUSP1 6 JUN 4 DUSP1 6 NR4A23 DUSP1 6 TIPARP 3 DUSP1 6 CYR61 3 DUSP1 6 ATF3 2 DUSP1 6 RHOB 2 DUSP1 6NEDD9 2 DUSP1 6 MCL1 1 DUSP1 6 RASD1 1 DUSP1 1 JUNB 1 EGR1 1 TIPARP 1EGR1 1 BTG2 1 EGR1 2 JUNB 1 EGR1 2 BTG2 1 EGR1 2 EGR1 1 EGR1 3 KLF4 1EGR1 3 FOSB 1 EGR1 3 EGR1 1 EGR1 4 FOSB 1 EGR1 4 CSRNP1 1 EGR1 4 EGR1 1EGR1 5 EGR1 35 EGR1 5 FOS 30 EGR1 5 NR4A1 25 EGR1 5 FOSB 23 EGR1 5 BTG222 EGR1 5 CYR61 20 EGR1 5 ZFP36 18 EGR1 5 CSRNP1 17 EGR1 5 NR4A3 13 EGR15 EGR3 13 EGR1 5 KLF6 12 EGR1 5 RHOB 11 EGR1 5 DUSP1 10 EGR1 5 ATF3 9EGR1 5 JUN 9 EGR1 5 TIPARP 8 EGR1 5 NFKBIZ 7 EGR1 5 NR4A2 7 EGR1 5 JUNB7 EGR1 5 IER2 7 EGR1 5 MCL1 4 EGR1 5 KLF4 4 EGR1 5 EGR2 4 EGR1 5 NEDD9 2EGR1 5 SRF 2 EGR1 5 GADD45B 1 EGR1 5 TRIB1 1 EGR1 1 FOS 14 FOS 1 BTG2 14FOS 1 CSRNP1 13 FOS 1 ZFP36 13 FOS 1 JUNB 9 FOS 1 NR4A3 7 FOS 1 FOSB 7FOS 1 SIK1 6 FOS 1 BHLHE40 6 FOS 1 RHOB 5 FOS 1 TIPARP 5 FOS 1 KLF6 5FOS 1 MCL1 5 FOS 1 NR4A1 4 FOS 1 EGR1 4 FOS 1 NR4A2 4 FOS 1 GADD45B 3FOS 1 SOCS3 2 FOS 1 NFKBIZ 1 FOS 2 FOS 24 FOS 2 FOSB 22 FOS 2 EGR1 20FOS 2 NR4A1 19 FOS 2 BTG2 18 FOS 2 ZFP36 12 FOS 2 CSRNP1 11 FOS 2 CYR6110 FOS 2 DUSP1 8 FOS 2 ATF3 8 FOS 2 IER2 7 FOS 2 RHOB 6 FOS 2 TIPARP 6FOS 2 NR4A2 6 FOS 2 JUN 6 FOS 2 JUNB 6 FOS 2 EGR3 5 FOS 2 NR4A3 4 FOS 2KLF6 3 FOS 2 PPP1R15A 2 FOS 2 NEDD9 2 FOS 2 KLF4 2 FOS 2 EGR2 2 FOS 2MCL1 1 FOS 1 EMP1 7 GADD45B 1 BHLHE40 7 GADD45B 1 SOCS3 7 GADD45B 1NR4A3 4 GADD45B 1 FOSL2 3 GADD45B 1 GADD45B 3 GADD45B 1 RNF122 3 GADD45B1 KLF10 3 GADD45B 1 CSRNP1 3 GADD45B 1 SLC2A3 2 GADD45B 1 ZFP36 1GADD45B 2 FOSB 2 GADD45B 2 NR4A1 2 GADD45B 2 FOS 2 GADD45B 2 GADD45B 2GADD45B 2 BTG2 2 GADD45B 2 NR4A3 2 GADD45B 2 JUNB 2 GADD45B 2 EGR1 2GADD45B 2 CSRNP1 2 GADD45B 2 ZFP36 2 GADD45B 2 RHOB 1 GADD45B 2 EGR3 1GADD45B 2 ATF3 1 GADD45B 3 GADD45B 4 GADD45B 3 JUNB 4 GADD45B 3 CSRNP1 4GADD45B 3 ZFP36 4 GADD45B 3 SOCS3 4 GADD45B 3 RHOB 3 GADD45B 3 BHLHE40 3GADD45B 3 FOS 2 GADD45B 3 FOSL2 2 GADD45B 3 BTG2 2 GADD45B 3 NR4A3 2GADD45B 3 FOSB 1 GADD45B 3 IRF1 1 GADD45B 1 FOSL2 1 ZFP36 1 HBEGF 1ZFP36 1 BHLHE40 1 ZFP36 2 HBEGF 1 ZFP36 2 NR4A3 1 ZFP36 2 BHLHE40 1ZFP36 3 CSRNP1 53 ZFP36 3 ZFP36 49 ZFP36 3 JUNB 29 ZFP36 3 FOS 26 ZFP363 BHLHE40 24 ZFP36 3 BTG2 24 ZFP36 3 FOSB 20 ZFP36 3 NR4A3 18 ZFP36 3SOCS3 18 ZFP36 3 EGR1 16 ZFP36 3 RHOB 16 ZFP36 3 FOSL2 15 ZFP36 3 NR4A115 ZFP36 3 GADD45B 10 ZFP36 3 MYADM 9 ZFP36 3 KLF6 8 ZFP36 3 CYR61 8ZFP36 3 EGR3 8 ZFP36 3 EMP1 8 ZFP36 3 LMNA 7 ZFP36 3 TIPARP 7 ZFP36 3NR4A2 7 ZFP36 3 MCL1 6 ZFP36 3 SIK1 6 ZFP36 3 ATF3 6 ZFP36 3 CEBPD 5ZFP36 3 IER3 5 ZFP36 3 IER2 5 ZFP36 3 MAFF 4 ZFP36 3 IRF1 4 ZFP36 3RNF122 4 ZFP36 3 SRF 3 ZFP36 3 ERRFI1 2 ZFP36 3 SLC25A25 2 ZFP36 3CDKN1A 2 ZFP36 3 EGR2 2 ZFP36 3 KLF4 1 ZFP36

Example 4 Prospective Validation Study of RS27 Study Design andStatistical Methods

The algorithm RS27 in Table 4 was tested in a prospective clinicalvalidation study that included 395 evaluable patients who had surgeryfor their prostate cancer between 1997 and 2010 at the University ofCalifornia, San Francisco (UCSF). The patients had Low or Intermediaterisk (by CAPRA) for clinically localized prostate cancer who might havebeen reasonable candidates for active surveillance but underwent RP atUCSF within 6 months of the diagnosis of prostate cancer by biopsy. Norandomization for patient selection was performed. For each patient,prostate biopsy samples from one fixed, paraffin-embedded tissue (FPET)block containing one or more tumor-containing needle cores wasevaluated.

To investigate if there is a significant relationship between RS27 orany component of RS27 and adverse pathology at RP, multivariable andunivariable multinomial logistic regression models were used andp-values from likelihood-ratio (LR) tests of the null hypothesis thatthe odds ratio (OR) is one were reported. The multinomial logistic modelwas also used to calculate estimates with 95% confidence intervals ofthe probability of high-grade or non-organ confined disease. To evaluatethe relationship between RS27, baseline covariates, and combinations ofthese factors with high grade or non-organ confined disease,multivariable and univariable binary logistic regression models wereused and p-values from likelihood-ratio tests of the null hypothesisthat the odds ratio (OR) is one were reported.

The primary endpoint was formulated as follows:

TABLE 12 Clinical Endpoint—RP Grade and Stage RP Gleason ScorePathologic T2 Stage Pathologic T3 Stage ≦3 + 3    1 2 3 + 4 3 4 Majorpattern 4 or 5 6 minor pattern 5, or tertiary pattern 5where Gleason Score≦3+3 and pT2 (denoted “1”) is the reference categoryand all other categories (2-6) are compared to the reference category.

Cell combinations of Table 12 evaluated in binary logistic regressionmodels include the following:

Cells 2, 4, 6 vs. 1, 3, 5: Non-organ-confined disease

Cells 5, 6 vs. 1, 2, 3, 4: High-grade disease

Cells 2, 4, 5, 6 vs. 1 and 3: High-grade or non-organ-confined disease

RS27 Algorithm

RS27 on a scale from 0 to 100 was derived from reference-normalized geneexpression measurements as follows.

Unscaled RS27 (RS27u) was defined as in Table 4:

RS27u=0.735*ECM (Stromal Response) group−0.368*Migration (CellularOrganization) group−0.352*PSA (Androgen) group+0.095*Proliferation(TPX2)

Where:

ECM (Stromal Response) group score=0.527*BGN+0.457*COL1A1+0.156*SFRP4

Migration (Cellular Organization) groupscore=0.163*FLNC+0.504*GSN+0.421*TPM2+0.394*GSTM2

PSA (Androgen) groupscore=0.634*FAM13C+1.079*KLK2+0.642*AZGP1+0.997*SRD5A2 Thresh

Proliferation (TPX2) score=TPX2 Thresh

where the thresholded gene scores for SRD5A2 and TPX2 are calculated asfollows:

${{SRD}\; 5A\; 2\mspace{14mu} {Thresh}} = \left\{ {{\begin{matrix}5.5 & {{{if}\mspace{14mu} {SRD}\; 5A\; 2} < 5.5} \\{{SRD}\; 5\; A\; 2} & {otherwise}\end{matrix}{TPX}\; 2\mspace{14mu} {Thresh}} = \left\{ \begin{matrix}5.0 & {{{if}\mspace{14mu} {TPX}\; 2} < 5.0} \\{{TPX}\; 2} & {otherwise}\end{matrix} \right.} \right.$

RS27u is then rescaled to be between 0 and 100 as follows:

${RS}\; 27({scaled})\left\{ \begin{matrix}0 & {{{if}\mspace{11mu} 13.4 \times \left( {{{RS}\; 27u} + 10.5} \right)} < 0} \\{13.4 \times \left( {{{RS}\; 27u} + 10.5} \right)} & {{{if}\mspace{14mu} 0} \leq {13.4 \times \left( {{{RS}\; 27u} + 10.5} \right)} \leq 100} \\100 & {{{if}\mspace{14mu} 13.4 \times \left( {{{RS}\; 27u} + 10.5} \right)} > 100}\end{matrix} \right.$

Patients were classified into low, intermediate, and high RS27 groupsusing pre-specified cut-points defined in Table 13 below. Thesecut-points defined the boundaries between low and intermediate RS27groups and between intermediate and high RS27 groups. The cutpoints werederived from the discovery study with the intent of identifyingsubstantial proportions of patients who on average had clinicallymeaningful low or high risk of adverse pathology. The RS27 was roundedto the nearest integer before the cut-points defining the RS27 groupswere applied.

TABLE 13 RS27 Group Score Low Less than 16 Intermediate Greater than orequal to 16 and less than 30 High Greater than or equal to 30

Assay Methods

Paraffin from the samples was removed with Shandon Xylene substitute(Thermo Scientific, Kalamazoo, Mich.). Nucleic acids were isolated usingthe Agencourt® FormaPure® XP kit (Beckman Coulter, Beverly, Mass.).

The amount of RNA was determined using the Quant-iT™ RiboGreen® RNAAssay kit (Invitrogen™, Carlsbad, Calif.). Quantitated RNA wasconvereted to complementary DNA using the Omniscript® RT kit (Qiagen,Valencia, Calif.) and combined with the reverse primers for the 12 genesof RS27 and 5 normalization genes (ARF1, ATP5E, CLTC, GPS1, PGK1) asshown in Table A. The reaction was incubated at 37° C. for 60 minutesand then inactivated at 93° C. for 5 minutes.

The cDNA was preamplified using a custom TaqMan® PreAmp Master Mix madefor Genomic Health, Inc. by Life Technologies (Carlsbad, Calif.) and theforward and reverse primers for all targets as shown in Table A. Thereaction was placed in a thermocycler (DNA Engine® PTC 200G, Bio-Rad,Hercules, Calif.) and incubated under the following conditions: A) 95°C. for 15 sec; B) 60° C. for 4 min; C) 95° C. for 15 sec; and D) steps Band C were repeated 8 times. The amplified product was then mixed withthe forward and reverse primers and probes for each of the targets asshown in Table A and the QuantiTect® Primer Assay master mix (Qiagen,Valencia, Calif.) and amplified for 45 cycles in a LightCycler® 480(Roche Applied Science, Indianapolis, Ind.). The level of expression wascalculated using the crossing-point (Cp) method.

Results

RS27 significantly predicted for adverse pathology, non-organ-confineddisease, high-grade disease, and high-grade or non-organ-confineddisease, and adds value beyond biopsy Gleason Score as shown in Tables14, 15, 16, and 17, respectively.

TABLE 14 Prediction of Adverse Pathology Variable LR Chi-Square DFP-value RS27 Score 19.31 5 0.002 Central Biopsy Gleason Score 32.86 5<0.001 3 + 4 vs 3 + 3 Results obtained from the multivariablemultinomial logistic model for cells 2, 3, 4, 5, and 6 vs 1 in Table 12.DF = degrees of freedom

TABLE 15 Prediction of Non-organ-confined disease 95% Odds Confidence LRChi- P- Model Variables Ratio Interval DF Square value Uni- RS27 2.20(1.46, 3.31) 1 14.44 <0.001 variable Model Multi- RS27 1.93 (1.25, 2.96)1 8.97 0.003 variable Central Biopsy 1.79 (1.04, 3.10) 1 4.23 0.040Model Gleason Score 3 + 4 vs 3 + 3 Results obtained from univariable andmultivariable binary logistic regression models for cells 2, 4, 6 vs. 1,3, 5 in Table 12 Odds ratio for RS27 was per 20 unit increase

TABLE 16 Prediction of high-grade disease 95% Odds Confidence LR Chi- P-Model Variables Ratio Interval DF Square value Uni- RS27 2.48 (1.60,3.85) 1 16.78 <0.001 variable Model Multi- RS27 2.32 (1.46, 3.67) 112.92 <0.001 variable Central Biopsy 1.36 (0.75, 2.47) 1 0.98 0.322Model Gleason Score 3 + 4 vs 3 + 3 Results obtained from univariable andmultivariable binary logistic regression models for cells 5, 6 vs. 1-4in Table 12 Odds ratio for RS27 was per 20 unit increase

TABLE 17 Prediction of high-grade or non-organ-confined disease 95% OddsConfidence LR Chi- P- Model Variables Ratio Interval DF Square valueUni- RS27 2.23 (1.52, 3.27) 1 17.77 <0.001 variable Model Multi- RS271.93 (1.30, 2.88) 1 10.70 0.001 variable Central Biopsy 1.94 (1.17,3.21) 1 6.45 0.011 Model Gleason Score 3 + 4 vs 3 + 3 *Results obtainedfrom univariable and multivariable binary logistic regression models forcells 2, 4, 5, 6 vs. 1, 3 in Table 12 Odds ratio for RS17 was per 20unit increase

In addition, RS27 predicted adverse pathology beyond conventionalclinical/pathology treatment factors as shown in Table 18.

TABLE 18 Prediction of Adverse Pathology Beyond ConventionalClinical/Pathology Treatment Factors Variable LR Chi-Square DF P-valueRS27 21.46 5 <0.001 Original Biopsy Gleason Score 22.77 5 <0.001 RS2719.31 5 0.002 Central Biopsy Gleason Score 32.86 5 <0.001 RS27 30.09 5<0.001 Clin T2 v. T1 11.94 5 0.036 RS27 30.17 5 <0.001 Baseline PSA(ng/ml) <10 v. >= 10 10.44 5 0.064 RS27 30.75 5 <0.001 Continuous PSA15.17 5 0.010 RS27 26.36 5 <0.001 Age 19.05 5 0.002 RS27 29.20 5 <0.001Pct Core Positive 4.75 5 0.447

When added to conventional clinical/pathology tools such as CAPRA, RS27further refined the risk of high grade or non-organ-confined disease.Using CAPRA alone, 5% of patients were identified as having greater than85% probability of being free from high-grade or non-organ-confineddisease compared to 22% of patients identified of being free fromhigh-grade or non-organ-confined disease using RS27 in addition to CAPRA(FIG. 8).

When added to conventional clinical/pathology tools such as AUA (D'Amicoet al., JAMA 280:969-974, 1998), RS27 further refined the risk of highgrade or non-organ-confined disease. As shown in FIG. 9, using AUAalone, 0% of patients are identified as having greater than 80%probability of being free from high-grade or non-organ-confined diseasecompared to 27% of patients identified of being free from high-grade ornon-organ-confined disease using GPS in addition to AUA.

Individual genes and gene groups of RS27 were also associated withadverse pathology, high-grade disease, non-organ-confined disease, andhigh-grade or non-organ-confined disease, in univariable analyses asshown in Tables 19, 20, 21, and 22, respectively.

TABLE 19 Association of Genes and Gene Groups with Adverse Pathology,Univariable Analyses Genes and Gene Groups LR Chi-Square DF P-value BGN7.11 5 0.213 COL1A1 7.88 5 0.163 SFRP4 8.87 5 0.114 FLNC 12.26 5 0.031GSN 5.73 5 0.333 GSTM2 1.84 5 0.870 TPM2 18.33 5 0.003 AZGP1 22.87 5<0.001 KLK2 5.97 5 0.309 FAM13C1 21.55 5 <0.001 SRD5A2 9.10 5 0.105SRD5A2 Thresholded 9.25 5 0.099 TPX2 14.26 5 0.014 TPX2 Thresholded23.34 5 <0.001 Ref Gene Average 3.27 5 0.659 Stromal Group Score 9.84 50.080 Cellular Organization Group 8.04 5 0.154 Score Androgen GroupScore 29.46 5 <0.001 Proliferation Group Score 23.34 5 <0.001 GPS 29.985 <0.001

TABLE 20 Association of Genes and Gene Groups with High-Grade Disease,Univariable Analyses Gene Chi-Square DF P-value OR (95% CI) BGN 3.67 10.055 1.46 (0.99, 2.15) COL1A1 2.33 1 0.127 1.32 (0.93, 1.87) SFRP4 6.081 0.014 1.33 (1.06, 1.67) FLNC 3.04 1 0.081 0.77 (0.57, 1.03) GSN 0.14 10.710 0.94 (0.67, 1.32) GSTM2 0.03 1 0.870 0.97 (0.69, 1.37) TPM2 2.85 10.091 0.76 (0.56, 1.04) AZGP1 12.69 1 <0.001 0.58 (0.42, 0.79) KLK2 3.501 0.061 0.62 (0.38, 1.02) FAM13C1 9.29 1 0.002 0.51 (0.33, 0.79) SRD5A23.26 1 0.071 0.76 (0.56, 1.02) SRD5A2 Thresholded 2.70 1 0.100 0.75(0.53, 1.06) TPX2 1.72 1 0.190 1.21 (0.91, 1.59) TPX2 Thresholded 7.38 10.007 1.93 (1.20, 3.11) Ref Gene Average 1.18 1 0.277 0.86 (0.65, 1.14)Stromal Response Group Score 4.92 1 0.027 1.49 (1.05, 2.12) CellularOrganization Group 1.12 1 0.290 0.87 (0.66, 1.13) Score Androgen GroupScore 15.07 1 <0.001 0.69 (0.58, 0.83) Proliferation Group Score 7.38 10.007 1.93 (1.20, 3.11)

TABLE 21 Association of Genes and Gene Groups with Non-Organ-ConfinedDisease, Univariable Analyses Gene Chi-Square DF p-value Odds Ratio 95%CI BGN 2.58 1 0.109 1.34 (0.94, 1.91) COL1A1 2.90 1 0.089 1.33 (0.96,1.83) SFRP4 4.39 1 0.036 1.25 (1.01, 1.54) FLNC 0.34 1 0.560 0.92 (0.70,1.21) GSN 0.27 1 0.603 0.92 (0.67, 1.26) GSTM2 0.16 1 0.693 0.94 (0.68,1.29) TPM2 0.51 1 0.473 0.9 (0.67, 1.20) AZGP1 12.48 1 <0.001 0.59(0.44, 0.80) KLK2 2.10 1 0.148 0.71 (0.45, 1.12) FAM13C1 12.42 1 0.0000.48 (0.32, 0.73) SRD5A2 2.42 1 0.120 0.8 (0.60, 1.06) SRD5A2Thresholded 2.65 1 0.103 0.77 (0.56, 1.06) TPX2 6.38 1 0.012 1.39 (1.08,1.81) TPX2 Thresholded 6.51 1 0.011 1.82 (1.14, 2.89) Ref Gene Average0.33 1 0.563 0.93 (0.73, 1.19) Stromal Response Group Score 4.24 1 0.0401.41 (1.02, 1.95) Cellular Organization Group Score 0.45 1 0.504 0.92(0.72, 1.18) Androgen Group Score 14.64 1 <0.001 0.71 (0.60, 0.85)Proliferation Group Score 6.51 1 0.011 1.82 (1.14, 2.89)

TABLE 22 Association of Genes and Gene Groups with High-Grade orNon-Organ-Confined Disease, Univariable Analyses Gene Chi-Square DFp-value Odds Ratio 95% CI BGN 3.15 1 0.076 1.33 (0.97, 1.84) COL1A1 1.961 0.162 1.23 (0.92, 1.65) SFRP4 7.08 1 0.008 1.29 (1.07, 1.55) FLNC 2.361 0.125 0.83 (0.65, 1.06) GSN 0.45 1 0.503 0.91 (0.68, 1.20) GSTM2 0.491 0.484 0.9 (0.68, 1.20) TPM2 2.24 1 0.135 0.82 (0.63, 1.06) AZGP1 12.201 0.001 0.61 (0.46, 0.82) KLK2 2.18 1 0.140 0.73 (0.48, 1.11) FAM13C111.13 1 0.001 0.53 (0.37, 0.78) SRD5A2 4.36 1 0.037 0.76 (0.59, 0.98)SRD5A2 Thresholded 4.63 1 0.032 0.73 (0.55, 0.98) TPX2 3.50 1 0.062 1.25(0.99, 1.58) TPX2 Thresholded 5.86 1 0.016 1.73 (1.09, 2.74) Ref GeneAverage 0.68 1 0.041 0.91 (0.73, 1.14) Stromal Response Group Score 4.591 0.032 1.37 (1.03, 1.84) Cellular Organization Group 1.54 1 0.215 0.87(0.70, 1.08) Score Androgen Group Score 16.56 1 <0.001 0.72 (0.61, 0.84)Proliferation Group Score 5.86 1 0.016 1.73 (1.09, 2.74)

Example 5 RS27 Adds Value Beyond PTEN/TMPRSS2-ERG Status in PredictingClinical Recurrence

PTEN mutation and TMPRSS2-ERG fusion genes are commonly associated withpoor prognosis in prostate cancer. Here, RS27 was analyzed to determinewhether it can provide value beyond PTEN/TMPRSS2-ERG status inpredicting clinical recurrence.

PTEN and TMPRSS2-ERG fusion expression levels obtained in the geneidentification study described in Example 1 above and in U.S. Pub. No.20120028264 were used to stratify patients into PTEN low and PTEN normalgroups. PTEN and TMPRSS2-ERG (“T2-ERG”) status of the patients werefound as follows:

TABLE 23 Distribution of PTEN Expression by T2-ERG Status T2-ERGNegative (53%) T2-ERG Positive (47%) Median PTEN 8.9 8.7 25% PTEN 8.78.4

A cutpoint for “PTEN low” was made at <=8.5, which includedapproximately 13% of T2-ERG negative patients and 28% of T2-ERG positivepatients. PTEN normal was defined as >8.5.

Univaraible Cox Proportional Hazards was applied to evaluate theassociation between PTEN status and time to clinical recurrence (cR).FIG. 10 and Table 24 show that PTEN low patients have a higher risk ofrecurrence compared to PTEN normal patients.

TABLE 24 Chi Sq P-value HR 95% CI 12.44 <0.001 0.38 (0.22, 0.65)

When the patients were further stratified into PTEN low/T2-ERG negative(“category 0”), PTEN low/T2-ERG positive (“category 1”), PTENnormal/T2-ERG negative (“category 2”), and PTEN normal/T2-ERG positive(“category 3”), both PTEN low categories had the lowest recurrence ratescompared to PTEN normal patients as shown in FIG. 11 and Table 25.

TABLE 25 PTEN/T2-ERG categories CHISQ P-VALUE 95% CI Cat 1 v 0 0.93 0.34(0.28, 1.55) Cat 2 v 0 11.80 <0.01 (0.12, 0.56) Cat 3 v 0 7.05 0.01(0.16, 0.76)

The tables below summarize the results of a multivariable model withPTEN/T2-ERG status (Table 26) or PTEN status (Table 27), RS27, andbiopsy Gleason Score (Bx GS), demonstrating that RS27 adds value beyondPTEN and T2-ERG markers and Biopsy GS in predicting clinical recurrence.

TABLE 26 VARIABLE DF CHISQ P-VALUE HR 95% CI RS27 1 64.13 <0.01 1.07(1.05, 1.09) PTEN/T2-ERG Status 3 1.59 0.66 PTEN/T2-ERG (Cat 1 v. 0) 10.06 0.80 0.91 (0.41, 1.98) PTEN/T2-ERG (Cat 2 v. 0) 1 1.17 0.28 0.65(0.29, 1.42) PTEN/T2-ERG (Cat 3 v. 0) 1 0.14 0.71 0.86 (0.39, 1.89) BXGS 2 7.19 0.03 Bx GS (7 v. 6) 1 6.86 0.01 0.40 (0.20, 0.79) Bx GS (8+ v.6) 1 1.35 0.24 0.69 (0.36, 1.29)

TABLE 27 VARIABLE DF CHISQ P-VALUE HR 95% CI GPS 1 66.67 <0.01 1.07(1.05, 1.09) PTEN Status 1 0.86 0.35 0.78 (0.46, 1.32) BX GS 2 6.43 0.04Bx GS (7 v. 6) 1 6.12 0.01 0.42 (0.21, 0.84) Bx GS (8+ v. 6) 1 1.15 0.280.71 (0.37, 1.33)

TABLE A SEQ Forward Official Symbol: Sequence ID ID NO: Primer SequenceALDH1A2 NM_170696.1   1 CACGTCTGTCCCTCTCTGCT ANPEP NM_001150.2   5CCACCTTGGACCAAAGTAAAGC AR NM_000044.2   9 CGACTTCACCGCACCTGAT ARF1NM_001658.2  13 CAGTAGAGATCCCCGCAACT ASPN NM_017680.4  17CATTGCCACTTCAACTCTAA ATP5E NM_006886.2  21 CCGCTTTCGCTACAGCAT AZGP1NM_001185.2  25 GAGGCCAGCTAGGAAGCAA BGN NM_001711.3  29GAGCTCCGCAAGGATGAC BIN1 NM_004305.1  33 CCTGCAAAAGGGAACAAGAG BMP6NM_001718.4  37 GTGCAGACCTTGGTTCACCT C7 NM_000587.2  41ATGTCTGAGTGTGAGGCGG CADM1 NM_014333.2  45 CCACCACCATCCTTACCATC CD276NM_001024736.1      49 CCAAAGGATGCGATACACAG CD44 NM_000610.3  53GGCACCACTGCTTATGAAGG CDC20 NM_001255.2  57 AGTGACCTGCACTCGCTGCT CDKN2CNM_001262.2  61 TGAAGGGAACCTGCCCTTGCA CLTC NM_004859.1  65ACCGTATGGACAGCCACAG COL1A1 NM_000088.2  69 GTGGCCATCCAGCTGACC COL1A2NM_000089.2  73 CAGCCAAGAACTGGTATAGGAGCT COL3A1 NM_000090.3  77GGAGGTTCTGGACCTGCTG COL4A1 NM_001845.4  81 ACAAAGGCCTCCCAGGAT COL5A2NM_000393.3  85 GGTCGAGGAACCCAAGGT COL6A1 NM_001848.2  89GGAGACCCTGGTGAAGCTG COL8A1 NM_001850.3  93 TGGTGTTCCAGGGCTTCT CSF1NM_000757.3  97 TGCAGCGGCTGATTGACA CSRP1 NM_004078.1 101ACCCAAGACCCTGCCTCT CYP3A5 NM_000777.2 105 TCATTGCCCAGTATGGAGATG DESNM_001927.3 109 ACTTCTCACTGGCCGACG DPP4 NM_001935.3 113GTCCTGGGATCGGGAAGT DUSP1 NM_004417.2 117 AGACATCAGCTCCTGGTTCA EGR1NM_001964.2 121 GTCCCCGCTGCAGATCTCT EGR3 NM_004430.2 125CCATGTGGATGAATGAGGTG ERG NM_004449.3 129 CCAACACTAGGCTCCCCA F2RNM_001992.2 133 AAGGAGCAAACCATCCAGG FAM107A NM_007177.2 137TTCTGCCCAGGCCTTCCCAC FAM13C NM_198215.2 141 ATCTTCAAAGCGGAGAGCG FAPNM_004460.2 145 GTTGGCTCACGTGGGTTAC FLNC NM_001458.4 149CAGGACAATGGTGATGGCT FN1 NM_002026.2 153 GGAAGTGACAGACGTGAAGGT FOSNM_005252.2 157 CGAGCCCTTTGATGACTTCCT GADD45B NM_015675.1 161ACCCTCGACAAGACCACACT GPM6B NM_001001994.1 165 ATGTGCTTGGAGTGGCCT GPS1NM_004127.4 169 AGTACAAGCAGGCTGCCAAG GSN NM_000177.1 173CTTCTGCTAAGCGGTACATCGA GSTM1 NM_000561.1 177 AAGCTATGAGGAAAAGAAGTACACGATGSTM2 NM_000848.2 181 CTGCAGGCACTCCCTGAAAT HLF NM_002126.4 185CACCCTGCAGGTGTCTGAG IGF1 NM_000618.1 189 TCCGGAGCTGTGATCTAAGGA IGFBP2NM_000597.1 193 GTGGACAGCACCATGAACA IGFBP6 NM_002178.1 197TGAACCGCAGAGACCAACAG IL6ST NM_002184.2 201 GGCCTAATGTTCCAGATCCT INHBANM_002192.1 205 GTGCCCGAGCCATATAGCA ITGA7 NM_002206.1 209GATATGATTGGTCGCTGCTTTG JUN NM_002228.2 213 GACTGCAAAGATGGAAACGA KLK2NM_005551.3 217 AGTCTCGGATTGTGGGAGG KRT15 NM_002275.2 221GCCTGGTTCTTCAGCAAGAC KRT5 NM_000424.2 225 TCAGTGGAGAAGGAGTTGGA LAMB3NM_000228.1 229 ACTGACCAAGCCTGAGACCT LGALS3 NM_002306.1 233AGCGGAAAATGGCAGACAAT MMP11 NM_005940.2 237 CCTGGAGGCTGCAACATACC MYBL2NM_002466.1 241 GCCGAGATCGCCAAGATG NFAT5 NM_006599.2 245CTGAACCCCTCTCCTGGTC OLFML3 NM_020190.2 249 TCAGAACTGAGGCCGACAC PAGE4NM_007003.2 253 GAATCTCAGCAAGAGGAACCA PGK1 NM_000291.1 257AGAGCCAGTTGCTGTAGAACTCAA PPAP2B NM_003713.3 261 ACAAGCACCATCCCAGTGAPPP1R12A NM_002480.1 265 CGGCAAGGGGTTGATATAGA PRKCA NM_002737.1 269CAAGCAATGCGTCATCAATGT SDC1 NM_002997.1 273 GAAATTGACGAGGGGTGTCT SFRP4NM_003014.2 277 TACAGGATGAGGCTGGGC SHMT2 NM_005412.4 281AGCGGGTGCTAGAGCTTGTA SLC22A3 NM_021977.2 285 ATCGTCAGCGAGTTTGACCT SMAD4NM_005359.3 289 GGACATTACTGGCCTGTTCACA SPARC NM_003118.1 293TCTTCCCTGTACACTGGCAGTTC SRC NM_005417.3 297 TGAGGAGTGGTATTTTGGCAAGASRD5A2 NM_000348.2 301 GTAGGTCTCCTGGCGTTCTG STAT5B NM_012448.1 305CCAGTGGTGGTGATCGTTCA TGFB1|1 NM_001042454.1 309 GCTACTTTGAGCGCTTCTCGTHBS2 NM_003247.2 313 CAAGACTGGCTACATCAGAGTCTTAGTG TNFRSF10B NM_003842.2 317 CTCTGAGACAGTGCTTCGATGACT TPM2 NM_213674.1 321AGGAGATGCAGCTGAAGGAG TPX2 NM_012112.2 325 TCAGCTGTGAGCTGCGGATA TUBB2ANM_001069.1 329 CGAGGACGAGGCTTAAAAAC UBE2T NM_014176.1 333TGTTCTCAAATTGCCACCAA VCL NM_003373.2 337 GATACCACAACTCCCATCAAGCT ZFP36NM_003407.1 341 CATTAACCCACTCCCCTGA Official SEQ ID  Reverse SEQ IDSymbol: NO: Primer Sequence NO: Probe Sequence ALDH1A2   2GACCGTGGCTCAACTTTGTAT   3 TCTCTGTAGGGCCCAGCTCTCAGG ANPEP   6TCTCAGCGTCACCTGGTAGGA   7 CTCCCCAACACGCTGAAACCCG AR  10TGACACAAGTGGGACTGGGATA  11 ACCATGCCGCCAGGGTACCACA ARF1  14ACAAGCACATGGCTATGGAA  15 CTTGTCCTTGGGTCACCCTGCA ASPN  18ATTGTTAGTGTCCAGGCTCT  19 TATCCCTTTGGAAGACCTTGCTTG ATP5E  22TGGGAGTATCGGATGTAGCTG  23 TCCAGCCTGTCTCCAGTAGGCCAC AZGP1  26CAGGAAGGGCAGCTACTGG  27 TCTGAGATCCCACATTGCCTCCAA BGN  30CTTGTTGTTCACCAGGACGA  31 CAAGGGTCTCCAGCACCTCTACGC BIN1  34CGTGGTTGACTCTGATCTCG  35 CTTCGCCTCCAGATGGCTCCC BMP6  38CTTAGTTGGCGCACAGCAC  39 TGAACCCCGAGTATGTCCCCAAAC C7  42AGGCCTTATGCTGGTGACAG  43 ATGCTCTGCCCTCTGCATCTCAGA CADM1  46GATCCACTGCCCTGATCG  47 TCTTCACCTGCTCGGGAATCTGTG CD276  50GGATGACTTGGGAATCATGTC  51 CCACTGTGCAGCCTTATTTCTCCAATG CD44  54GATGCTCATGGTGAATGAGG  55 ACTGGAACCCAGAAGCACACCCTC CDC20  58GGCTTCCTTGGCTTTGCGCT  59 CCAATGCACCCCCTGCGCGCTGGC CDKN2C  62TGTGCTTCACCAGGAACTCCACC  63 TGGCTGCCAAAGAAGGCCACCTCCGGGT CLTC  66TGACTACAGGATCAGCGCTTC  67 TCTCACATGCTGTACCCAAAGCCA COL1A1  70CAGTGGTAGGTGATGTTCTGGGA  71 TCCTGCGCCTGATGTCCACCG COL1A2  74AAACTGGCTGCCAGCATTG  75 TCTCCTAGCCAGACGTGTTTCTTGTCCTTG COL3A1  78ACCAGGACTGCCACGTTC  79 CTCCTGGTCCCCAAGGTGTCAAAG COL4A1  82GAGTCCCAGGAAGACCTGCT  83 CTCCTTTGACACCAGGGATGCCAT COL5A2  86GCCTGGAGGTCCAACTCTG  87 CCAGGAAATCCTGTAGCACCAGGC COL6A1  90TCTCCAGGGACACCAACG  91 CTTCTCTTCCCTGATCACCCTGCG COL8A1  94CCCTGTAAACCCTGATCCC  95 CCTAAGGGAGAGCCAGGAATCCCA CSF1  98CAACTGTTCCTGGTCTACAAACTCA  99 TCAGATGGAGACCTCGTGCCAAATTACA CSRP1 102GCAGGGGTGGAGTGATGT 103 CCACCCTTCTCCAGGGACCCTTAG CYP3A5 106GACAGGCTTGCCTTTCTCTG 107 TCCCGCCTCAAGTTTCTCACCAAT DES 110GCTCCACCTTCTCGTTGGT 111 TGAACCAGGAGTTTCTGACCACGC DPP4 114GTACTCCCACCGGGATACAG 115 CGGCTATTCCACACTTGAACACGC DUSP1 118GACAAACACCCTTCCTCCAG 119 CGAGGCCATTGACTTCATAGACTCCA EGR1 122CTCCAGCTTAGGGTAGTTGTCCAT 123 CGGATCCTTTCCTCACTCGCCCA EGR3 126TGCCTGAGAAGAGGTGAGGT 127 ACCCAGTCTCACCTTCTCCCCACC ERG 130CCTCCGCCAGGTCTTTAGT 131 AGCCATATGCCTTCTCATCTGGGC F2R 134GCAGGGTTTCATTGAGCAC 135 CCCGGGCTCAACATCACTACCTGT FAM107A 138AGGAGCTGGGGTGTACGGAGA 139 TCTCCGAGGCTCCCCAGGGCCCCG FAM13C 142GCTGGATACCACATGCTCTG 143 TCCTGACTTTCTCCGTGGCTCCTC FAP 146GACAGGACCGAAACATTCTG 147 AGCCACTGCAAACATACTCGTTCATCA FLNC 150TGATGGTGTACTCGCCAGG 151 ATGTGCTGTCAGCTACCTGCCCAC FN1 154ACACGGTAGCCGGTCACT 155 ACTCTCAGGCGGTGTCCACATGAT FOS 158GGAGCGGGCTGTCTCAGA 159 TCCCAGCATCATCCAGGCCCAG GADD45B 162TGGGAGTTCATGGGTACAGA 163 TGGGAGTTCATGGGTACAGA GPM6B 166TGTAGAACATAAACACGGGCA 167 CGCTGAGAAACCAAACACACCCAG GPS1 170GCAGCTCAGGGAAGTCACA 171 CCTCCTGCTGGCTTCCTTTGATCA GSN 174GGCTCAAAGCCTTGCTTCAC 175 ACCCAGCCAATCGGGATCGGC GSTM1 178GGCCCAGCTTGAATTTTTCA 179 TCAGCCACTGGCTTCTGTCATAATCAGGAG GSTM2 182CCAAGAAACCATGGCTGCTT 183 CTGAAGCTCTACTCACAGTTTCTGGG HLF 186GGTACCTAGGAGCAGAAGGTGA 187 TAAGTGATCTGCCCTCCAGGTGGC IGF1 190CGGACAGAGCGAGCTGACTT 191 TGTATTGCGCACCCCTCAAGCCTG IGFBP2 194CCTTCATACCCGACTTGAGG 195 CTTCCGGCCAGCACTGCCTC IGFBP6 198GTCTTGGACACCCGCAGAAT 199 ATCCAGGCACCTCTACCACGCCCTC IL6ST 202AAAATTGTGCCTTGGAGGAG 203 CATATTGCCCAGTGGTCACCTCACA INHBA 206CGGTAGTGGTTGATGACTGTTGA 207 ACGTCCGGGTCCTCACTGTCCTTCC ITGA7 210AGAACTTCCATTCCCCACCAT 211 CAGCCAGGACCTGGCCATCCG JUN 214TAGCCATAAGGTCCGCTCTC 215 CTATGACGATGCCCTCAACGCCTC KLK2 218TGTACACAGCCACCTGCC 219 TTGGGAATGCTTCTCACACTCCCA KRT15 222CTTGCTGGTCTGGATCATTTC 223 TGAACAAAGAGGTGGCCTCCAACA KRT5 226TGCCATATCCAGAGGAAACA 227 CCAGTCAACATCTCTGTTGTCACAAGCA LAMB3 230GTCACACTTGCAGCATTTCA 231 CCACTCGCCATACTGGGTGCAGT LGALS3 234CTTGAGGGTTTGGGTTTCCA 235 ACCCAGATAACGCATCATGGAGCGA MMP11 238TACAATGGCTTTGGAGGATAGCA 239 ATCCTCCTGAAGCCCTTTTCGCAGC MYBL2 242CTTTTGATGGTAGAGTTCCAGTGATTC 243 CAGCATTGTCTGTCCTCCCTGGCA NFAT5 246AGGAAACGATGGCGAGGT 247 CGAGAATCAGTCCCCGTGGAGTTC OLFML3 250CCAGATAGTCTACCTCCCGCT 251 CAGACGATCCACTCTCCCGGAGAT PAGE4 254GTTCTTCGATCGGAGGTGTT 255 CCAACTGACAATCAGGATATTGAACCTGG PGK1 258CTGGGCCTACACAGTCCTTCA 259 TCTCTGCTGGGCAAGGATGTTCTGTTC PPAP2B 262CACGAAGAAAACTATGCAGCAG 263 ACCAGGGCTCCTTGAGCAAATCCT PPP1R12A 266TGCCTGGCATCTCTAAGCA 267 CCGTTCTTCTTCCTTTCGAGCTGC PRKCA 270GTAAATCCGCCCCCTCTTCT 271 CAGCCTCTGCGGAATGGATCACACT SDC1 274AGGAGCTAACGGAGAACCTG 275 CTCTGAGCGCCTCCATCCAAGG SFRP4 278GTTGTTAGGGCAAGGGGC 279 CCTGGGACAGCCTATGTAAGGCCA SHMT2 282ATGGCACTTCGGTCTCCA 283 CCATCACTGCCAACAAGAACACCTG SLC22A3 286CAGGATGGCTTGGGTGAG 287 CAGCATCCACGCATTGACACAGAC SMAD4 290ACCAATACTCAGGAGCAGGATGA 291 TGCATTCCAGCCTCCCATTTCCA SPARC 294AGCTCGGTGTGGGAGAGGTA 295 TGGACCAGCACCCCATTGACGG SRC 298CTCTCGGGTTCTCTGCATTGA 299 AACCGCTCTGACTCCCGTCTGGTG SRD5A2 302TCCCTGGAAGGGTAGGAGTAA 303 AGACACCACTCAGAATCCCCAGGC STAT5B 306GCAAAAGCATTGTCCCAGAGA 307 CAGCCAGGACAACAATGCGACGG TGFB1|1 310GGTCACCATCTTGTGTCGG 311 CAAGATGTGGCTTCTGCAACCAGC THBS2 314CAGCGTAGGTTTGGTCATAGATAGG 315 TGAGTCTGCCATGACCTGTTTTCCTTCAT TNFRSF10B 318 CCATGAGGCCCAACTTCCT 319 CAGACTTGGTGCCCTTTGACTCC TPM2 322CCACCTCTTCATATTTGCGG 323 CCAAGCACATCGCTGAGGATTCAG TPX2 326ACGGTCCTAGGTTTGAGGTTAAGA 327 CAGGTCCCATTGCCGGGCG TUBB2A 330ACCATGCTTGAGGACAACAG 331 TCTCAGATCAATCGTGCATCCTTAGTGAA UBE2T 334AGAGGTCAACACAGTTGCGA 335 AGGTGCTTGGAGACCATCCCTCAA VCL 338TCCCTGTTAGGCGCATCAG 339 AGTGGCAGCCACGGCGCC ZFP36 342CCCCCACCATCATGAATACT 343 CAGGTCCCCAAGTGTGCAAGCTC Official SEQ ID Symbol:NO: Amplicon Sequence: ALDH1A2    4CACGTCTGTCCCTCTCTGCTTTCTCTGTAGGGCCCAGCTCTCAGGAATACAAAGTTGAGCCAC GGTCANPEP   8CCACCTTGGACCAAAGTAAAGCGTGGAATCGTTACCGCCTCCCCAACACGCTGAAACCCGATTCCTACCGGGTGACGCTGAGA AR  12CGACTTCACCGCACCTGATGTGTGGTACCCTGGCGGCATGGTGAGCAGAGTGCCCTATCCCAGTCCCACTTGTGTCA ARF1  16CAGTAGAGATCCCCGCAACTCGCTTGTCCTTGGGTCACCCTGCATTCCATAGCCATGTGCT TGT ASPN 20 CATTGCCACTTCAACTCTAAGGAATATTTTTGAGATATCCCTTTGGAAGACCTTGCTTGGAAGAGCCTGGACACTAACAAT ATP5E  24CCGCTTTCGCTACAGCATGGTGGCCTACTGGAGACAGGCTGGACTCAGCTACATCC GATACTCCCAAZGP1  28 GAGGCCAGCTAGGAAGCAAGGGTTGGAGGCAATGTGGGATCTCAGACCCAGTAGCTGCCCTTCCTG BGN  32GAGCTCCGCAAGGATGACTTCAAGGGTCTCCAGCACCTCTACGCCCTCGTCCTGGTGAA CAACAAG BIN1 36 CCTGCAAAAGGGAACAAGAGCCCTTCGCCTCCAGATGGCTCCCCTGCCGCCACCCCCGAGATCAGAGTCAACCACG BMP6  40GTGCAGACCTTGGTTCACCTTATGAACCCCGAGTATGTCCCCAAACCGTGCTGTGCGC CAACTAAG C7 44 ATGTCTGAGTGTGAGGCGGGCGCTCTGAGATGCAGAGGGCAGAGCATCTCTGTCACCAGCATAAGGCCT CADM1  48CCACCACCATCCTTACCATCATCACAGATTCCCGAGCAGGTGAAGAAGGCTCGATCAGG GCAGTGGATCCD276  52 CCAAAGGATGCGATACACAGACCACTGTGCAGCCTTATTTCTCCAATGGACATGATTCCCAAGTCATCC CD44  56GGCACCACTGCTTATGAAGGAAACTGGAACCCAGAAGCACACCCTCCCCTCATTCACCATGAGC ATCCDC20  60AGTGACCTGCACTCGCTGCTTCAGCTGGATGCACCCATCCCCAATGCACCCCCTGCGCGCTGGCAGCGCAAAGCCAAGGAAGCC CDKN2C  64TGAAGGGAACCTGCCCTTGCACTTGGCTGCCAAAGAAGGCCACCTCCGGGTGGTGGAGTTCCTGGTGAAGCACA CLTC  68ACCGTATGGACAGCCACAGCCTGGCTTTGGGTACAGCATGTGAGATGAAGCGCTGATCCTG TAGTCACOL1A1  72 GTGGCCATCCAGCTGACCTTCCTGCGCCTGATGTCCACCGAGGCCTCCCAGAACATCACCTACCACTG COL1A2  76CAGCCAAGAACTGGTATAGGAGCTCCAAGGACAAGAAACACGTCTGGCTAGGAGAAACTATCAATGCTGGCAGCCAGTTT COL3A1  80GGAGGTTCTGGACCTGCTGGTCCTCCTGGTCCCCAAGGTGTCAAAGGTGAACGTGGCAG TCCTGGTCOL4A1  84 ACAAAGGCCTCCCAGGATTGGATGGCATCCCTGGTGTCAAAGGAGAAGCAGGTCTTCCTGGGACTC COL5A2  88GGTCGAGGAACCCAAGGTCCGCCTGGTGCTACAGGATTTCCTGGTTCTGCGGGCAGAGTTGGACCTCCAGGC COL6A1  92GGAGACCCTGGTGAAGCTGGCCCGCAGGGTGATCAGGGAAGAGAAGGCCCCGTTGGTGTC CCTGGAGACOL8A1  96 TGGTGTTCCAGGGCTTCTCGGACCTAAGGGAGAGCCAGGAATCCCAGGGGATCAGGGTTTACAGGG CSF1 100TGCAGCGGCTGATTGACAGTCAGATGGAGACCTCGTGCCAAATTACATTTGAGTTTGTAGACCAGGAACAGTTG CSRP1 104ACCCAAGACCCTGCCTCTTCCACTCCACCCTTCTCCAGGGACCCTTAGATCACATCA CTCCACCCCTGCCYP3A5  108TCATTGCCCAGTATGGAGATGTATTGGTGAGAAACTTGAGGCGGGAAGCAGAGAAAGGCAA GCCTGTCDES 112 ACTTCTCACTGGCCGACGCGGTGAACCAGGAGTTTCTGACCACGCGCACCAACGAGAAGGTGGAGC DPP4 116GTCCTGGGATCGGGAAGTGGCGTGTTCAAGTGTGGAATAGCCGTGGCGCCTGTATCCCGG TGGGAGTACDUSP1 120AGACATCAGCTCCTGGTTCAACGAGGCCATTGACTTCATAGACTCCATCAAGAATGCTGGAGGAAGGGTGTTTGTC EGR1 124GTCCCCGCTGCAGATCTCTGACCCGTTCGGATCCTTTCCTCACTCGCCCCCATGGACAAACTACCCTAAGCTGGAG EGR3 128CCATGTGGATGAATGAGGTGTCTCCTTTCCATACCCAGTCTCACCTTCTCCCCACCCTACCTCACCTCTTCTCAGGCA ERG 132CCAACACTAGGCTCCCCACCAGCCATATGCCTTCTCATCTGGGCACTTACTACTAA AGACCTGGCGGAGGF2R 136 AAGGAGCAAACCATCCAGGTGCCCGGGCTCAACATCACTACCTGTCATGATGTGCTCAATGAAACCCTGC FAM107A 140TTCTGCCCAGGCCTTCCCACCAGGAATCTCCGAGGCTCCCCAGGGCCCCGCTTCTCCGTACACCCCAGCTCCT FAM13C 144ATCTTCAAAGCGGAGAGCGGGAGGAGCCACGGAGAAAGTCAGGAGACAGAGCATGTGGTATC CAGC FAP148 GTTGGCTCACGTGGGTTACTGATGAACGAGTATGTTTGCAGTGGCTAAAAATGTTTCGGTCCTGTCAGAGTCCAGA FLNC 152CAGGACAATGGTGATGGCTCATGTGCTGTCAGCTACCTGCCCACGGAGCCTGGCGAGT ACACCATCA FN1156 GGAAGTGACAGACGTGAAGGTCACCATCATGTGGACACCGCCTGAGAGTGCAGTGACCGGCTACCGTGT FOS 160CGAGCCCTTTGATGACTTCCTGTTCCCAGCATCATCCAGGCCCAGTGGCTCTGAGACAGC CCGCTCCGADD45B 164ACCCTCGACAAGACCACACTTTGGGACTTGGGAGCTGGGGCTGAAGTTGCTCTGTACCCATG AACTCCCAGPM6B 168 ATGTGCTTGGAGTGGCCTGGCTGGGTGTGTTTGGTTTCTCAGCGGTGCCCGTGTTTATGTTCTACA GPS1 172AGTACAAGCAGGCTGCCAAGTGCCTCCTGCTGGCTTCCTTTGATCACTGTGACTTCCCTGAG CTGC GSN176 CTTCTGCTAAGCGGTACATCGAGACGGACCCAGCCAATCGGGATCGGCGGACGCCCATCACCGTGGTGAAGCAAGGCTTTGAGCC GSTM1 180AAGCTATGAGGAAAAGAAGTACACGATGGGGGACGCTCCTGATTATGACAGAAGCCAGTGGCTGAATGAAAAATTCAAGCTGGGCC GSTM2 184CTGCAGGCACTCCCTGAAATGCTGAAGCTCTACTCACAGTTTCTGGGGAAGCAGCCATG GTTTCTTGGHLF 188 CACCCTGCAGGTGTCTGAGACTAAGTGATCTGCCCTCCAGGTGGCGATCACCTTCTGCTCCTAGGTACC IGF1 192TCCGGAGCTGTGATCTAAGGAGGCTGGAGATGTATTGCGCACCCCTCAAGCCTGCCAAGTCAGCTCGCTCTGTCCG IGFBP2 196GTGGACAGCACCATGAACATGTTGGGCGGGGGAGGCAGTGCTGGCCGGAAGCCCCTCAAGTCGGGTATGAAGG IGFBP6 200TGAACCGCAGAGACCAACAGAGGAATCCAGGCACCTCTACCACGCCCTCCCAGCCCAATTCTGCGGGTGTCCAAGAC IL6ST 204GGCCTAATGTTCCAGATCCTTCAAAGAGTCATATTGCCCAGTGGTCACCTCACACTCCTCCAAGGCACAATTTT INHBA 208GTGCCCGAGCCATATAGCAGGCACGTCCGGGTCCTCACTGTCCTTCCACTCAACAGTCATCAACCACTACCG ITGA7 212GATATGATTGGTCGCTGCTTTGTGCTCAGCCAGGACCTGGCCATCCGGGATGAGTTGGATGGTGGGGAATGGAAGTTCT JUN 216GACTGCAAAGATGGAAACGACCTTCTATGACGATGCCCTCAACGCCTCGTTCCTCCCGTCCGAGAGCGGACCTTATGGCTA KLK2 220AGTCTCGGATTGTGGGAGGCTGGGAGTGTGAGAAGCATTCCCAACCCTGGCAGGTGGCTG TGTACAKRT15 224GCCTGGTTCTTCAGCAAGACTGAGGAGCTGAACAAAGAGGTGGCCTCCAACACAGAAATGATCCAGACCAGCAAG KRT5 228TCAGTGGAGAAGGAGTTGGACCAGTCAACATCTCTGTTGTCACAAGCAGTGTTTCCTCT GGATATGGCALAMB3 232 ACTGACCAAGCCTGAGACCTACTGCACCCAGTATGGCGAGTGGCAGATGAAATGCTGCAAGTGTGAC LGALS3 236AGCGGAAAATGGCAGACAATTTTTCGCTCCATGATGCGTTATCTGGGTCTGGAAACCCAAA CCCTCAAGMMP11 240CCTGGAGGCTGCAACATACCTCAATCCTGTCCCAGGCCGGATCCTCCTGAAGCCCTTTTCGCAGCACTGCTATCCTCCAAAGCCATTGTA MYBL2 244GCCGAGATCGCCAAGATGTTGCCAGGGAGGACAGACAATGCTGTGAAGAATCACTGGAACTCTACCATCAAAAG NFAT5 248CTGAACCCCTCTCCTGGTCACCGAGAATCAGTCCCCGTGGAGTTCCCCCTCCACCTCGCCA TCGTTTCCTOLFML3 252 TCAGAACTGAGGCCGACACCATCTCCGGGAGAGTGGATCGTCTGGAGCGGGAGGTAGACTATCTGG PAGE4 256 GAATCTCAGCAAGAGGAACCACCAACTGACAATCAGGATATTGAACCTGGACAAGAGAGAGAAGGAACACCTCCGATCGAAGAAC PGK1 260AGAGCCAGTTGCTGTAGAACTCAAATCTCTGCTGGGCAAGGATGTTCTGTTCTTGAAGGACTGTGTAGGCCCAG PPAP2B 264ACAAGCACCATCCCAGTGATGTTCTGGCAGGATTTGCTCAAGGAGCCCTGGIGGCCTGCTGCATAGTTTTOTTCGTG PPP1R12A 268CGGCAAGGGGTTGATATAGAAGCAGCTCGAAAGGAAGAAGAACGGATCATGCTTAGAGATGCC AGGCAPRKCA 272CAAGCAATGCGTCATCAATGTCCCCAGCCTCTGCGGAATGGATCACACTGAGAAGAGGGGGCGG ATTTACSDC1 276GAAATTGACGAGGGGTGTCTTGGGCAGAGCTGGCTCTGAGCGCCTCCATCCAAGGCCAGGTTCTCCGTTAGCTCCT SFRP4 280TACAGGATGAGGCTGGGCATTGCCTGGGACAGCCTATGTAAGGCCATGTGCCCCTTGCC CTAACAACSHMT2 284 AGCGGGTGCTAGAGCTTGTATCCATCACTGCCAACAAGAACACCTGTCCTGGAGACCGAAGTGCCAT SLC22A3 288ATCGTCAGCGAGTTTGACCTTGTCTGTGTCAATGCGTGGATGCTGGACCTCACCCAAGCC ATCCTGSMAD4 292GGACATTACTGGCCTGTTCACAATGAGCTTGCATTCCAGCCTCCCATTTCCAATCATCCTGCTCCTGAGTATTGGT SPARC 296TCTTCCCTGTACACTGGCAGTTCGGCCAGCTGGACCAGCACCCCATTGACGGGTACCTCTCCCACACCGAGCT SRC 300TGAGGAGTGGTATTTTGGCAAGATCACCAGACGGGAGTCAGAGCGGTTACTGCTCAATGCAGAGAACCCGAGAG SRD5A2 304GTAGGTCTCCTGGCGTTCTGCCAGCTGGCCTGGGGATTCTGAGTGGTGTCTGCTTAGAGTTTACTCCTACCCTTCCAGGGA STAT5B 308CCAGTGGTGGTGATCGTTCATGGCAGCCAGGACAACAATGCGACGGCCACTGTTCTCTGGGACAATGCTTTTGC TGFB1|1 312GCTACTTTGAGCGCTTCTCGCCAAGATGTGGCTTCTGCAACCAGCCCATCCGACACAAGATGG TGACCTHBS2 316CAAGACTGGCTACATCAGAGTCTTAGTGCATGAAGGAAAACAGGTCATGGCAGACTCAGGACCTATCTATGACCAAACCTACGCTG TNFRSF10B 320CTCTGAGACAGTGCTTCGATGACTTTGCAGACTTGGTGCCCTTTGACTCCTGGGAGCCGCTCATGAGGAAGTTGGGCCTCATGG TPM2 324AGGAGATGCAGCTGAAGGAGGCCAAGCACATCGCTGAGGATTCAGACCGCAAATATGAAG AGGTGG TPX2328 TCAGCTGTGAGCTGCGGATACCGCCCGGCAATGGGACCTGCTCTTAACCTCAAACC TAGGACCGTTUBB2A 332CGAGGACGAGGCTTAAAAACTTCTCAGATCAATCGTGCATCCTTAGTGAACTTCTGTTGTCCTCAAGCATGGT UBE2T 336TGTTCTCAAATTGCCACCAAAAGGTGCTTGGAGACCATCCCTCAACATCGCAACTGTGTTGA CCTCT VCL340 GATACCACAACTCCCATCAAGCTGTTGGCAGTGGCAGCCACGGCGCCTCCTGATGCGCCTAACAGGGA ZFP36 344CATTAACCCACTCCCCTGACCTCACGCTGGGGCAGGTCCCCAAGTGTGCAAGCTCAGTATTCATGATGGTGGGGG

What is claimed is:
 1. A method of predicting a likelihood of a clinicaloutcome for a patient with prostate cancer, comprising: determining alevel of one or more RNA transcripts, or an expression product thereof,in a biological sample containing cancer cells obtained from saidpatient, wherein the one or more RNA transcripts, or an expressionproduct thereof, is selected from BIN1, IGF1, C7, GSN, DES, TGFB1I1,TPM2, VCL, FLNC, ITGA7, COL6A1, PPP1R12A, GSTM1, GSTM2, PAGE4, PPAP2B,SRD5A2, PRKCA, IGFBP6, GPM6B, OLFML3, HLF, CYP3A5, KRT15, KRT5, LAMB3,SDC1, DUSP1, EGFR1, FOS, JUN, EGR3, GADD45B, ZFP36, FAM13C, KLK2, ASPN,SFRP4, BGN, THBS2, INHBA, COL1A1, COL3A1, COL1A2, SPARC, COL8A1, COL4A1,FN1, FAP, COL5A2, CDC20, TPX2, UBE2T, MYBL2, and CDKN2C; assigning theone or more RNA transcripts, or an expression product thereof, to one ormore gene groups selected from a cellular organization gene group, basalepithelia gene group, a stress response gene group, an androgen genegroup, a stromal response gene group, and a proliferation gene group;calculating a quantitative score for the patient by weighting the levelof one or more RNA transcripts, or an expression product thereof, bytheir contribution to a clinical outcome; and predicting a likelihood ofa clinical outcome for the patient based on the quantitative score. 2.The method of claim 1, further comprising: determining a level of atleast one RNA transcript, or an expression product thereof, in thebiological sample, wherein the at least one RNA transcript, or anexpression product thereof, is selected from STAT5B, NFAT5, AZGP1,ANPEP, IGFBP2, SLC22A3, ERG, AR, SRD5A2, GSTM1, and GSTM2; and weightingthe level of the at least one RNA transcript, or an expression productthereof, by its contribution to the clinical outcome to calculate thequantitative score.
 3. The method of claim 1, wherein an increase in thequantitative score correlates with an increased likelihood of a negativeclinical outcome.
 4. The method of claim 1, wherein the clinical outcomeis upgrading of prostate cancer.
 5. The method of claim 1, wherein theclinical outcome is upstaging of prostate cancer.
 6. The method of claim1, wherein the clinical outcome is recurrence of prostate cancer.
 7. Themethod of claim 1, wherein the level of at least three RNA transcripts,or their expression products, from the stromal response gene group aredetermined, and wherein the stromal response gene group comprises ASPN,BGN, COL1A1, SPARC, FN1, COL3A1, COL4A1, INHBA, THBS2, and SFRP4.
 8. Themethod of claim 1, wherein the level of at least one RNA transcript, orits expression product, from the androgen gene group is determined, andwherein the androgen gene group comprises FAM13C, KLK2, AZGP1, andSRD5A2.
 9. The method of claim 1, further comprising determining thelevel of at least three RNA transcripts, or their expression products,from the cellular organization gene group, wherein the cellularorganization gene group comprises FLNC, GSN, GSTM2, IGFBP6, PPAP2B,PPP1R12A, BIN1, VCL, IGF1, TPM2, C7, and GSTM1.
 10. The method of claim1, further comprising determining the level of at least one RNAtranscript, or its expression product, from the proliferation genegroup, wherein the proliferation gene group comprises TPX2, CDC20, andMYBL2.
 11. The method of claim 1, wherein the level of any one of thegene combinations shown in Table 4 are determined.
 12. The method ofclaim 1, wherein the quantitative score is calculated based on any oneof the algorithms shown in Table
 4. 13. The method of claim 1, whereinthe RNA transcripts, or their expression products, of BGN, COL1A1,SFRP4, FLNC, GSN, TPM2, FAM13C, and KLK2 are assigned to the followinggene groups: a) stromal response gene group: BGN, COL1A1, and SFRP4 b)cellular organization gene group: FLNC, GSN, and TPM2 and c) androgengene group: FAM13C and KLK2; and wherein the level of the RNAtranscripts, or their expression products, of at least one of a)-c) aredetermined.
 14. A method of predicting a likelihood of a clinicaloutcome for a patient with prostate cancer, comprising: determining alevel of one or more RNA transcripts, or an expression product thereof,in a biological sample containing cancer cells obtained from saidpatient, wherein the one or more RNA transcripts, or an expressionproduct thereof, is selected from BGN, COL1A1, SFRP4, FLNC, GSN, GSTM2,TPM2, AZGP1, KLK2, FAM13C1, SRD5A2, and TPX2, normalizing the level ofthe one or more RNA transcripts, or an expression product thereof, toobtain a normalized expression level of the one or more RNA transcripts,or an expression product thereof, comparing the normalized expressionlevel to gene expression data in reference prostate cancer samples, andpredicting the likelihood of one or more of adverse pathology,non-organ-confined disease, high-grade disease, or high-grade ornon-organ-confined disease in the patient based on the normalizedexpression level of the one or more RNA transcripts, or an expressionproduct thereof, wherein increased normalized expression levels of BGN,COL1A1, SFRP4, and TPX2 correlate with an increased likelihood ofadverse pathology, non-organ-confined disease, high-grade disease, orhigh-grade or non-organ confined disease, and wherein increasednormalized expression levels of FLNC, GSN, GSTM2, TPM2, AZGP1, KLK2,FAM13C1, and SRD5A2 correlate with a decreased likelihood of adversepathology, non-organ-confined disease, high-grade disease, or high-gradeor non-organ confined disease.
 15. The method of claim 14, furthercomprising assigning the one or more RNA transcripts, or an expressionproduct thereof, to one or more gene groups selected from a cellularorganization gene group an androgen gene group, a stromal response genegroup, and a proliferation gene group; calculating one or morequantitative scores for the patient by weighting the normalizedexpression level of one or more RNA transcripts, or an expressionproduct thereof, by their contribution to a clinical outcome; andpredicting the likelihood of one or more of adverse pathology,non-organ-confined disease, high-grade disease, or high-grade ornon-organ-confined disease for the patient based on the one or morequantitative scores; wherein the quantitative score is selected from astromal response group score, cellular organization group score,androgen group score, proliferation group score, and recurrence score;wherein the stromal response group score comprises the normalizedexpression levels of BGN, COL1A1, and SFRP4; the cellular organizationgroup score comprises the normalized expression levels of FLNC, GSN,TPM2, and GSTM2; the androgen group score comprises the normalizedexpression levels of FAM13C, LK2, AZGP1, and SRD5A2; the proliferationgroup score comprises the normalized expression level of TPX2; and therecurrence score comprises the stromal response group score, cellularorganization group score, androgen group score, and proliferation groupscore; and wherein an increased stromal response group score,proliferation group score, and recurrence score correlate with anincreased likelihood of adverse pathology, non-organ-confined disease,high-grade disease, or high-grade or non-organ confined disease, and anincreased cellular organization group score and androgen group scorecorrelate with a decreased likelihood of adverse pathology,non-organ-confined disease, high-grade disease, or high-grade ornon-organ confined disease.
 16. The method of claim 14, wherein thebiological sample is a tissue sample.
 17. The method of claim 16,wherein the tissue sample is fixed, paraffin-embedded, or fresh, orfrozen.
 18. The method of claim 14, wherein the level of one or more RNAtranscripts is determined by quantitative RT-PCR.
 19. The method ofclaim 14, further comprising creating a report summarizing theprediction.
 20. The method of claim 14, wherein the levels of the RNAtranscripts, or expression products thereof, of BGN, COL1A1, SFRP4,FLNC, GSN, GSTM2, TPM2, AZGP1, KLK2, FAM13C1, SRD5A2, and TPX2, aredetermined.